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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4195> which encodes the amino acid 
5 sequence <SEQ ID 4196>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1985 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 22/34 (64%) , Positives = 26/34 (75%) 

Query: 28 SFGTIRNSTALKQLTLDSLNLLSFGTIRNSTALK 61 . . 

SFGTI+NS ALKQ + +N SFGTI+NS ALK 
Sbjct: 7 SFGTIQNSIALKQKAQEEINQRSFGTIQNSIALK 40 
20 Identities = 22/34 (64%) , Positives = 26/34 (75%) 

Query: 6 SFGTIRNSTALKLYAKQSPAFRSFGTIRNSTALK 39 , . 

SFGTI+NS ALK A++ RSFGTI+NS ALK 
Sbjct: 7 SFGTIQNSIALKQKAQEEINQRSFGTIQNSIALK 40 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1372 

A DNA sequence (GBSxl457) was identified in S.agalactiae <SEQ ID 4197> which encodes the amino 
30 acid sequence <SEQ ID 4198>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1407 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 4199> which encodes the amino acid 
sequence <SEQ ID 4200>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 2055 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 154/221 (69%) , Positives = 187/221 (83%) 

Query: 1 MIKINFPILDEPLVLSNATILTIEDVSVYSSLVKHFYQYDVDEHLKLFDDKQKSLKATEL 60 
++ +NF +LDEP+ L TIL +EDV V+S +V++ YQY+ D LK FD K K++K +E+ 
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Sbjct: 8 LMNLNFSLLDEPIPLRGGTILVLEDVCVFSKIVQYCYQYEEDSELKFFDHKMKTIKESEI 67 

Query: 61 MLVTD I LGYD VNSAP I LKL IHGDLENQFNEKPEVKSMVEKLAATITELIAFECLENELDL 120 

MLVTD I LG+DVNS + ILKLIH DLE+QFNEKPEVKSM++KL ATITELI FECLENELDL 
Sbjct: 68 MLVTDILGFDVWSSTILKLIHADLESQFNEKPEVKSMIDKLVATITELIVFECLENELDL 127 

Query: 121 E YDE I KI LELI KALGVKI ETQSDTI FEKCFEI IQVYHYLTKKNLLVFVNSGAYLTKDEVI 180 

EYDEI I LELI K+LGVK+ETQSDTI FEKC E1+Q++ YLTKK LL+FVNSGA+LTKDEV 
Sbjct: 128 EYDEITILELIKSLGVKVETQSDTIFEKCLEILQIFKYLTKKKLJjIFVNSGAFIjTKDEVA 187 

Query: 181 KLCEYINLMQKSVLFLEPRRLYDLPQYVIDKDYFLIGENMV 221 

L EYI+L +VLFLEPR LYD PQY++D+DYFLI +NMV 
Sbjct: 188 SLQEYISLTNLTVLFLEPRELYDFPQYILDEDYFLITKNMV 228 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1373 

A DNA sequence (GBSxl458) was identified in S.agalactiae <SEQ ID 4201> which encodes the amino 
acid sequence <SEQ ID 4202>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0842 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=.0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9783> which encodes amino acid sequence <SEQ ID 9784> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB83918 GB:AL162753 hypothetical protein NMA0629 [Neisseria 
meningitidis Z2491] 
Identities = 45/104 (43%) , Positives = 65/104 (62%) , Gaps = 2/104 (1%) ' 

Query: 4 RYMRMILMFDMPTETAEERKAYRKFRKFLLSEGFIMHQFSVYSKLLLNNTANNAMIGRLK 63 
++MR+I+ FD+P TA +RKA +FR+FLL +G+ M Q SVYS+++ + RL 
' Sbjct: 5 KFMRIIVFFDLPVITAAKRKAANQFRQFLLKDGYQiyiLQLSvYSRIVKGRDSLQKHHNRLC 64 

Query: 64 VNNPKKGNITLLTVTEKQFARMVYLHGERNT- -SVANSDSRLVF 105 

N P++G+I L +TEKQ+A M L GE T NSD L+F 

Sbjct: 65 ANLPQEGSIRCLEITEKQYAAMKLLLGELICrQEKKVNSDQLLLF 108 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4203> which encodes the amino acid 
sequence <SEQ ID 4204>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0822 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/112 (86%) , Positives = 107/112 (94%) 

Query: 1 MSYRYMRMILMFDMPTETAEERKAYRKFRKFLLSEGFIKHQFSVYSKLLIjNNTANNAMIG 60 
MSYRYMRMILMFDMPT+TAEERKAYRKFRKFLLSEGFIMHQFS+YSKLLLNNTANNAMIG 
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Sbjct: 1 MSYRYMRMILMFDMPTDTAEERKAYRKFRKFLLSEGFIIffiQFSIYSKLLLl!OTANNAMIG 60 

Query: 61 RLKOTOTPKKGNITLLTVTEKQFARMVyLHGERNTSVANSDSRLVFLGDSYDQ 112 

RL+ +NP KGNITLLTVTEKQFARM+YLHGERN +ANSD RLVFLG+++D+ 
Sbjct: 61 RLREHNPNKGNITLLTVTEKQFARMIYLHGERNNCIAMSDERLVFLGEAFDE 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1374 

A DNA sequence (GBSxl459) was identified in S.agalactiae <SEQ ID 4205> which encodes the amino 
acid sequence <SEQ ID 4206>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3185 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB83919 GB-.AL162753 hypothetical protein NMA0630 [Neisseria 
meningitidis Z2491] 
Identities = 71/224 (31%) , Positives = 122/224 (53%) 

Query: 4 TOTVVVNTHSKLSYKNNHLIFKDSYQTEMIHLSEIDILIMETTDIVLSTMLIKRLVDENI 63 

WR++++ KLS + L+ + + ++ + L +1 ++I+E + +++ L+ L + 
Sbjct: 3 WRSLLIQNGGKLSLQRRQIiLIQQNGESHTVPLEDIAVIIIENRETLITAPLLSALAEHGA 62 

Query: 64 LVIFCDDKKLPTAMLMPYYARHDSSLQLSRQMSWIEDVKADVWTSIIAQKILNQSFYLGE 123 

++ CD++ LP +PY H L Q++ E +K +W 1+ QKILNQ+F E 

Sbjct: 63 TLLTCDEQFLPCGQWLPYAQYHRQLKILKLQLNISEPLKKQLWQHIVRQKILNQAFVADE 122 

Query: 124 CSFFEKSQSIMNLYHDLEPFDPSNREGHAARIYFNTLFGNDFSREQDNPINAGLDYGYSL 183 

++ + L ++ D NRE AA +YF LFG F+R +N +NA L+Y Y++ 
Sbjct: 123 TGNDIAAKRLRTIASEWSGDTGNREAQAAALYFQALFGEKFTRNDNNAWAALNYTYAV 182 

Query: 184 LLSMFAREWKCGCMTQFGLKHANQFNQFNIASDIMEPFRPIVD 227 

L+AR+ G+ GLH++N FNLA D +EP RP+ D 
Sbjct: 183 LRAAVARALTLYGWLPALGLFHRSELNPFNLADDFIEPLRPLAD 226 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4207> which encodes the amino acid 
sequence <SEQ ID 4208>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3 185 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 239/289 (82%) , Positives = 271/289 (93%) 

Query: 1 MAGWRTVWNTHSKLSYKNNHLIFKDSYOTO^ 60 

l^GTOTVWNTHSKLSYKNNHLIFKD+Y+TE+IHLSEIOT 
Sbjct: 1 MAGWRTVVVNTHSKLSYKNNHLIFKDAYKra^ 60 



Query: 61 ENILVIFCDDKRLPTAMLMPYYARHDSSLQLSRQMSWIEDVKADVWTSIIAQKILNQSFY 120 
EN+LVI FCDDKRLPTAMLMP+Y RHDSSLQL +QMSW E VK+ VWT+IIAQKILNQS Y 
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Sbjct: 61 ENVLVIFCDDKRLPTAMLMPFYGRHDSSLQLGKQMSWSETVKSQVWTTI IAQKILNQSCY 120 

Query: 121 LGECSFFEKSQSIMNLYHDLEPFDPSNREGHftARIYFNTLFGNDFSREQDNPINAGLDYG 180 

LG CS+FEKSQS IM+LYH LE FDPSNREGHAARIYFNTLFGNDFSR+ ++PINAGLDYG 
Sbjct: 121 LGACSYFEKSQSIMDLYHGLENFDPSNREGHAARIYFNTLFGM3FSRDLEHPINAGLDYG 180 

Query: 181 YSLLLSMFAREWKCGCMTQFGIjKHANQFNQFNLASDIMEPFRPIVDRIIYENRQSDFVK 240 

Y+LLLSMFAREW GCMTQFGLKHANQFNQFN ASDIMEPFRP+VD+ 1 + YENR F K 
Sbjct: 181 YTLLLSMFAREVWSGCMTQFGLKHANQFNQFNFASDIMEPFRPLVDKIVYENRNQPFPK 240 

Query: 241 MKRELFSMFSETYSYNGKEMYLSNIVSDYTKKVIKSLNSDGNGIPEFRI 289 

+KRELF++FS+T+SYNGKEMYL+NI+SDYTKKV+K+LN++G G+PEFRI 
Sbjct: 241 IKRELFTLFSDTFSYNGKEMYLTNIISDYTKKWKALNNEGKGVPEFRI 289 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1375 

A DNA sequence (GBSxl460) was identified in S.agalactiae <SEQ ID 4209> which encodes the amino 
acid sequence <SEQ ID 4210>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1109 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73943 GB:AL139078 hyopthetical protein Cj 1523c [Campylobacter 
jejuni] 

Identities = 165/746 (22%) , Positives = 291/746 (38%) , Gaps = 115/746 (15%) 

Query: 318 LSASMIQRYDEHREDLKQLKQFVKASLPEKYQEI - - FADSSKDGYAGYIEGKTNQEAFYK 375 

L+ S +R + L LK + Y++ F +S Y G + E ++ 

Sbjct: 50 LARSARKRLARRKARLNHLKHLIANEFKliNYEDYQSFDESIjyCAYKGSLISP- -YELRFR 107 

Query: 376 YLSKLLTKQEDSENFLE--KIKNEDFLRKQRTFDNGSIPHQVHLTELKAIIRRQS 428 

L++LL+KQ+ + L K + D ++ + G+I + E K + QS 

Sbjct: 108 ALNELLSKQDFARVI LH IAKRRGYDD I KNSDDKEKGAI LKAI KQNEEK- LANYQS VGEYL 166 

Query: 429 --EYYPFLKENQDRIEKILTFRIPYY IGPLAREKSDFAW-MTRKTDDSI 474 

EY+ KEN + + Y + + +++ +F + ++K ++ + 

Sbjct: 167 YKEYFQKFKENSKEFTNVRNKKESYERCIAQSFLKDELKLIFKKQREFGFSFSKKFEEEV 226. 

Query: 475 RPWNFEDLVDKEKSAFAFIHRMTNNDFYLPEEKVLPKHSLIYEKFTVYNELTKV--RYKN 532 

F +++ + F H + N F+ +EK PK+S + F + + KN 

Sbjct: 227 LSVAFY KRALKDFSHLVGNCSFFT-DEKRAPKNSPLAFMFVALTRIINLLNNLKN 280 

Query: 533 EQGETYFFDSNIKQEIFDGVFKEHRKVSK--KKLLDFLAKEYEEFRI VDVIGLDKENKAF 590 

+GYD + + VK K KKLL L+ +YE E + 

Sbjct: 281 TEGILYTKDD- -I^ALLNEVLKNGTLTYKQTKKIiLG-LSDDYE FKGEKGTY 328 

Query: 591 NASLGTYHDLEKILDKDFLDNPDNESILEDIVQTLTLFEDREMIKKRLENYKDLFTESQL 650 

Y+KL+L D L+I + +TL +D +KK L Y ++Q+ 
Sbjct: 329 FIEFKKYKEFIKALGEHNLSQDD LNEIAKDITLIKDEIKLKKALAKYD- -LNQNQI 382 

Query: 651 KKLYRRHYTGWGRLSAKLINGIRDK--ESQKTILDYLIDDGRSNRNFMQLINDDGLSFKS 708 

L++ +SK++ E+K D+ + N IN+D F 

Sbjct: 383 DSLSKLEFKDHLNISFKALKLVTPLMLEGKK YDEACNELNLKVAINEDKKDFLP 436 

Query: 709 IISKAQAGSHSDNLKEWGELAGSPAIKKGILQSLKIVDELVKVMGYEPEQIVVEMAREN 768 
++ N P + + I + K+++ L+K G + +1 +E+ARE 
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Sbjct: 437 AFNETYYKDEVTN PVVLRAIKEYRKVLNALLKKYG-KVHKINIELAREV 484 

Query: 769 QTTNQGR RNSRQRYKLLDDG VKNLASDLNG-NILKEYPTDNQALQNERLFLYY 820 

+ R ++YKD +L+N NILK L L+ 

Sbjct: 485 GKNHSQRAKIEKEQNENYKAKKDAEIjECEKLGLKINSKNILK LRLFK 531 

Query: 821 LQNGRDMYTGEALDIDNLSQ YDIDHIIPQAFIKDDSIDNRVLVSSAKNRGKSDDVPS 877 

Q Y+GE + I +L +IDHI P + DDS N+VLV + +N+ K + P 

Sbjct: 532 EQKEFCAYSGEKIKISDLQDEIMLEIDHIYPYSRSFDDSYMNKVLVFTKQNQEKMQTP- 590 

Query: 878 LEIVKDCKVFWKKL- -LDAKLMSQRKYDNLTKAERGGLTSDDKARFIQRQIVVETRQITKH 935 

E + W+K+ L L ++++ L K ++ F R L +TR I + 

Sbjct: 591 FEAFGNDSAKWQKIEVLAKNLPTKKQKRILDK NYKDKEQKNFKDRNLNDTRYIARL 646 

15 Query: 936 VARI LDERFNNELDSKGRRIRKVKIVTLKSNLVSNFRKEFGFYKIREVNNY 986 

V L + N +L+ ++ KV + L S R +GF N+ 

Sbjct: 647 VLNYTKDYLDFLPLSDDENTKLNDT-QKGSKVHVEAKSGMLTSA^ 705 

Query: 987 HHAHDAYLNAWAKAILTKYPQLEPE 1012 
20 HHA DA + A +1+ + + E 

Sbjct: 706 HHAIDAVIIAYANNSIVKAFSDFKKE 731 

A related DNA sequence was identified in S. pyogenes <SEQ ID 421 1> which encodes the amino acid 

sequence <SEQ ID 4212>. Analysis of this protein sequence reveals the following: 

25 Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0973 (Affirmative) < succ> 

30 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 881/1380 (63%) , Positives = 1088/1380 (78%) , Gaps = 22/1380 (1%) 

35 

Query: 1 MNKPYSIGLDIGTNSVGWSIITDDYKVPAKKMRVLGNTDKEYIKKNLIGALLFDGGNTAA 60 

M+K YSIGLDIGTNSVGW++ITD+YKVP+KK +VLGNTD+ IKKNLIGALLFD G TA 
Sbjct: 1 MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAE 60 

40 Query: 61 DRRLKRTARRRYTRRRNRILYLQEIFAEEMSKVDDSFFHRLEDSFLVEEDKRGSKYPIFA 120 

RLKRTARRRYTRR+NRI YLQEIF+ EM+KVDDSFFHRLE+SFLVEEDK+ ++PIF 
Sbjct: 61 ATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFG 120 

Query: 121 TLQEEKDYHEKFSTIYHLRKE1ADKKEKADLRI.IYIALAHIIKFRGHFLIEDDSFDVRNT 180 
45 + +E YHEK+ TIYHLRK+L D +KADLRLIY+ALAH+IKFRGHFLIE D + N+ 

Sbjct: 121 NIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGD-LNPDNS 179 

Query: 181 DISKQYQDFLEIFNTTFENNDLLSQNVDVEAILTDKISKSAKKDRILAQYPNQKSTGIFA 240 
D+ K + ++ +N FE N + + VD +A1L+ ++SKS + + ++AQ P +K G+F 
50 Sbjct: 180 DVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG 239 

Query: 241 EFLKLIVGNQADFKKYFNLEDKTPLQFAKDSYDEDLENLLGQIGDEFADLFSAAKKLYDS 300 

+ L +G +FK F+L + LQ +KD+YD+DL+NLL QIGD++ADLF AAK L D+ 
Sbjct: 240 NLIALSLGLTPNFKSNFDIJ^DAKLQLSKDTYDDDLDNLIAQIGDQYADLFLAAKNLSDA 299 

55 

Query: 301 VLIiSGILTVIDLSTKAPLSASMIQRYDEHREDLKQLKQFVKASLPEKYQEIFADSSKDGY 360 

+LLS IL V TKAPLSASMI+RYDEH +DI> LK V+ LPEKY+EIF D SK+GY 
Sbjct: 300 ILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQtPEKYKEIFFDQSKNGY 359 

60 Query: 361 AGYIEGKTNQEAFYKYLSKLLTKQEDSENFLEKIKNEDFLRKQRTFDNGSIPHQVHLTEL 420 

AGYI+G +QE FYK++ +L K + +E L K+ ED LRKQRTFDNGS I PHQ+HL EL 
Sbjct: 360 AGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGEL 419 

Query: 421 KAIIRRQSEYYPFLKENQDRIEKILTFRIPYYIGPLAREKSDFAWMTRKTDDSIRPWNFE 480 
65 AI+RRQ ++YPFLK+N+++IEKILTFRIPYY+GPLAR S FAWMTRK++++I PWNFE 
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Sbjct: 420 HAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPIARGNSRFAWMTRKSEETITPWNFE 479 

Query: 481 DLVDKEKSAEAFIHRMTNM3FYLPEEro7LPKHSIiIYEKFTVYNELTK\TRYKNE - QGETYF 539 

++VDK SA++FI RMTN D LP EKVLPKHSL+YE FTVYNELTKV+Y E + F 
Sbjct: 480 EVVDKGASAQSFIERMTNFDKNLPIffiKTOPKHSLLYEYFTVYl^LTKVKYVTEGMRKPAF 539 

Query: 540 FDSNIKQEIFDGVFKEHRKVSKKKLLDFLAKEYEEFRIVDVIGLDKENKAFNASLGTYHD 599 

K+ I D +FK +RKV+ K+L + K+ E F V++ G++ FNASLGTYHD 
Sbjct: 540 LSGEQKKAI VDLLFKTNRKVTVKQLKEDYFKKIECFDSVE I SGVEDR FNASLGTYHD 596 

Query: 600 LEKI L - DKDFLDNPDNES I LED I VQTLTLFEDREMI KKRLENYKDLFTESQLKKLYRRHY 658 

L KI+ DKDFLDN +NE ILEDIV TLTLFEDREMI ++RL+ Y LF + +K+L RR Y 
Sbjct: 597 LLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY 656 

15 Query: 659 TGWGRLSAKLINGIRDKESQKTILDYLIDDGRSNRNFMQLINDDGLSFKSIISKAQAGSH 718 

TGWGRLS KLINGIRDK+S KTILD+L DG +NRNFMQL1+DD L+FK 1 KAQ 
Sbjct: 657 TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ 716 

Query: 719 SDNLKEWGEIAGSPAIKKGILQSLKIVDELVKVMG-YEPEQIVVEMARENQTTNQGRRN 777 
20 ' D+L E + LAGSPAIKKGILQ++K+VDELVKVMG ++PE IV+EMARENQTT +G++N 

Sbjct: 717 GDSLHEHIANLAGSPAIKKGILQTVKWDELVKVMGRHKPENIVIEMARENQTTQKGQKN 776 

Query: 778 SRQRYKLLDDGVKNLASDLNGNILKEYPTDNQALQNERLFLYYLQNGRDMYTGEALDIDN 837 
SR+R K +++G+K L S ILKE+P +N LQNE+L+LYYLQNGRDMY + LDI+ 

25 Sbjct: 777 SRERMKRIEEGIKELGS QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR 832 

Query: 838 LSQYDIDHIIPQAFIKDDSIDNRVLVSSAKNRGKSDDVPSLEIVKDCKVFWKKLLDAKLM 897 

LS YD+DHI+PQ+F+KDDSIDN+VL S KNRGKSD+VPS E+VK K +W++LL+AKL+ 
Sbjct: 833 LSDYDVDH I VPQS FLKDDS I DNKVLTRSDKtKGKSDl^PSEEWKKMKlTYWRQLLNAKLI 892 



30 



50 



Query: 898 SQRKYDNLTKAERGGLTSDDKARFIQRQLVETRQITKHVARILDERFNNELDSKGRRIRK 957 

+QRK+DNLTKAERGGL+ DKA FI+RQLVETRQITKHVA+ILD R N + D + IR+ 
Sbjct: 893 TQRKFDNLTKAERGGLSELDKaGFIKRQLTOTRQITKHVAQILDSRMNTKYDENDKLIRE 952 



35 Query: 958 VKIOTLKSNLVSNFRKEFGFYKIREVNNYHHAHiaYIJSaVVAKAILTKYPQLEPEFVYGD 1017 

VK++TLKS LVS+FRK+F FYK+RE+NNYHHAHDAYLNAW A++ KYP+LE EFVYGD 
Sbjct: 953 VKVITLKSKLVSDFRKDFQFYKWEINNYHHAHTaYLNAWGTALIKKYPKLESEFVYGD 1012 

Query: 1018 YPKYN SYKTRKSATEKLFFYSNIMNFFKTKATrLADGOWAAKDDIEVNNDTGEI 1070 

40 Y Y+ S + AT K FFYSNIMNFFKT++TLA+G + + IE N +TGEI 

Sbjct: 1013 YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEI 1072 

Query: 1071 VWDKKKHFATTOKVLSYPQNNIVKKTEIQTGGFSKESILAHGNSDKLIPRKTKDIYLDPK 1130 
VWDK + FATVRKVLS PQ NIVKKTE+QTGGFSKESIL NSDKLI RK KD DPK 
45 Sbjct: 1073 VWDKGRDFATTOKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARK-KD--WDPK 1129 

Query: 1131 KYGGFDSPIVAYSVLWADIKKGKAQKLKTVTELLGITIMERSRFEKNPSAFLESKGYLN 1190 

KYGGFDSP VAYSVLWA ++KGK++KLK+V ELLGITIMERS FEKNP FLE+KGY 
Sbjct: 1130 K^GGFDSPTVAYSVLWAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKE 1189 



Query: 1191 IRADKLIILPKYSLFELENGRRRLLASAGELQKGNELALPTQFMKFLYLASRYNESKGKP 1250 

++ D +1 LPKYSLFELENGR+R+LASAGELQKGNELALP++++ FLYLAS Y + KG P 
Sbjct: 1190 VKKDL 1 1 KLPKYSLFELENGRKRMLASAGELQKGNEIiALPSKYVNFLYLASHYEKLKGS P 1249 



55 Query: 1251 EEIEKKQEFVNQHVSYFDDILQLINDFSKRVILADANLEKINKLYQDNKENISVDELANN 1310 

E+ E+KQ FV QH Y D+I++ I++FSKRVILADANL+K+ Y +++ + E A N 
Sbjct: 1250 EDNEQKQLFVEQHKHYLDEI1EQISEFSKRVILADANLDKVLSAYNKHRDK-PIREQAEN 1308 

Query: 1311 IINLFTFTSLGAPAAFKFFDKIVDRKRYTSTKEVLNSTLIHQSITGLYETRIDLGKLGED 1370 
60 II+LFT T+LGAPAAFK+FD +DRKRYTSTKEVL++TLIHQSITGLYETRIDL +LG D 

Sbjct: 1309 IIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD 1368 

SEQ ID 4210 (GBS317) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 27 (lane 2; MW 179.3kDa) and in Figure 159 (lane 5 & 6; MW 180kDa). It was 
65 also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
27 (lane 3; MW 154.3kDa) and in Figure 159 (lane 9 & 10; MW 154kDa). 
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GBS317-GST was purified as shown in Figure 224, lane 9-10. GBS317-His was purified as shown in 
Figure 222, lane 9. 

GBS317N was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 149 (lane 2-4; MW 1 16kDa). 

GBS317C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 166 (lane 6-8; MW 92kDa). 

GBS317dN was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 187 (lane 7; MW 1 16kDa). Purified GBS317dN-GST is shown in Figure 245, lane 8. 

GBS317C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 188 (lane 13; MW 92kDa). Purified GBS317dC-GST is shown in Figure 245, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1376 

A DNA sequence (GBSxl461) was identified in S.agalactiae <SEQ ID 4213> which encodes the amino 
acid sequence <SEQ ID 4214>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.94 Transmembrane 132 - 148 ( 123 - 156) 
INTEGRAL , Likelihood =-11.09 Transmembrane 190 - 206 ( 183 - 209) 
INTEGRAL Likelihood = -4.94 Transmembrane 95 - 111 ( 94 - 115) 



The protein has no significant homology with any sequences in the GENPEPT database. 

A related sequence was also identified in GAS <SEQ ID 9133> which encodes the amino acid sequence 
<SEQ ID 9134>. Analysis of this protein sequence reveals the following: 



>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.32 Transmembrane 126 - 142 
INTEGRAL Likelihood = -6.90 Transmembrane 178 - 194 

Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/204 (46%) , Positives = 139/204 (68%) 

Query. 5 LMKDKLLVVLTWIWIISIATIATIYIAWLIYPIEIQFLKLEKVWLKAETIYYNFNKLMI 64 

+M + ++ +W+W+++LA L TIY WL YP+E+ LKLE+W++ + I +N+N L+ 
Sbjct: 4 VMVENTKLLCSVTOWLLALAILITIYSTWLVWPLEVDHLKLEQVVFMSKDAILHNYNGLIiN 63 

Query: 65 YLTHPFI SDLNMPS FPSSEDGLKHFAD VKYLFTLAHGLFVT LTFPVI YFLRRGWKQKS I F 124 

YLT+PF++ L +F SS DGLKHFADVK+LF L +F+ L +P + + K K + 
Sbjct: 64 YLTNPFVTRLEFANFHSSADGLKHFADVKWLFHLTQVVFLGLLYPTLKTFTQRLKTKRFW 123 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 5776 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



Possible site: 22 
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Query: 125 LYEGFFKIAIMLPIFIWCAFLLGFDQFFTLFHEVIiFPGDSTWQFNPLTDPVIWILPETF 184 

L + +A + P+ I + A +GF+ FFTLFH+VLF GDS+W F+PL D VIWILPE F 
Sbjct: 124 LLQKPLILAALFPLMIGLMASFIGFEHFFTLFHQVLFVGDSSWLFDPLKDSVIWILPEVF 183 

5 

Query: 185 FLHCFI IFLLIYETITI ILLI IGR 208 

FLHCF+ F+++YE I L+ + R 
Sbjct: 184 FLHCFLFFMIVYEI ILWSLVGLAR 207 

10 SEQ ID 4214 (GBS167) was expressed in and purified from E.coli. The purified protein is shown in lanes 5 
& 6 of Figure 223. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1377 

15 A DNA sequence (GBSxl462) was identified in S.agalactiae <SEQ ID 4217> which encodes the amino 
acid sequence <SEQ ID 42 1 8>. This protein is predicted to be p-nitrophenyl phosphatase (pho2). Analysis 
of this protein sequence reveals the following: 



20 



25 



30 



Possible site: 48 

■>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3925 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15219 GB:Z99120 similar to N-acetyl-glucosamine catabolism 
[Bacillus subtilis] 
Identities = 121/249 (48%) , Positives = 172/249 (68%) 

Query: 3 YKGYLIDLDGTIYKGKSRIPAGERFIERLQEKGIPYMLVTNNTTRTPESVQEMLRGFNVE 62 

YKGYLIDLDGT+Y G +1 F+ L+++G+PY+ VTNN++RTP+ V + L F++ 

Sbjct: 4 YKGYLIDLDGTMYNGTEKIEFACEFTOTLKDRGVPYLFVTNNSSRTPKQVADKLVSFDIP 63 

35 Query: 63 TPLETIYTATMATVDYMITOMITOGKTAYVIGEEGLKKAIADAGYVEDTKNPAYVVVGLDWN 122 

E ++T +MAT ++ + + YVIGEEG+++AI + G +N +VWG+D + 

Sbjct: 64 ATEEQVFTTSMATAQHIAQQKKDASVYVIGEEGIRQAIEENGLTFGGENADFVWGIDRS 123 

Query: 123 vTYDKLATATIAIQNGALFIGTHPDIJTIPTERGLLPGAGSLNALLEAATRIKPVFIGKPN 182 
40 +TY+K A LAI+NGA FI TN D+ IPTERGLLPG GSL ++L +T ++PVFIGKP 

Sbjct: 124 ITYEKFAVGCLAIRNGARFISTNGDIAIPTERGLLPGNGSLTSVLTVSTGVQPVFIGKPE 183 

Query: 183 AIIMNKALEimiPRNQAVMVGDireLTDIMAGINNDIDTLLvTTGFTTVEEVPDLPIQPS 242 
+IIM +A+ +L ++ +MVGDNY TDIMAGIN +DTLLV TG T E + D +P+ 
45 Sbjct: 184 SIIMEQAmVLGTDVSETLWGDNYATDIMAGINAGMDTLLVHTGVTKREHMTDDMEKPT 243 

Query: 243 YVLASLDEW 251 

+ + SL EW 
Sbjct: 244 HAIDSLTEW 252 

50 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4219> which encodes the amino acid 
sequence <SEQ ID 4220>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 
55 INTEGRAL Likelihood = -0.53 Transmembrane 128 - 144 ( 128 - 144) 



Final Results 

bacterial membrane Certainty=0. 1213 (Affirmative) < suco 
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bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

5 >GP:CAB15219 GB: Z99120 similar to N-acetyl-glucosamine catabolism 

[Bacillus subtilis] 
Identities = 121/250 (48%) , Positives = 166/250 (66%) , Gaps = 1/250 (0%) 

Query: 3 YKGYLIDLDGTIYQGKNRIPAGERFIKRLQERGIPYLLVTNNTTRTPEMVQSMLftNQFHV 62 
10 YKGYL I DLDGT+ Y G +1 F++ L++RG+PYL VTNN++RTP+ V L + F + 

Sbjct: 4 YKGYLIDLDGTMYNGTEKIEFACEFVRTLKDRGVPYLFVTNNSSRTPKQVADKLVS-FDI 62 

Query: 63 ETSIETIYTATMATVDYMNDMNRGKTAYVIGETGLKSAIAAAGYVEELENPAYVWGLDS 122 
+ E ++T +MAT ++ + + YVIGE G++ AI G EN +VWG+D 

15 Sbjct: 63 PATEEQVFTTSMATAQHIAQQKKDASVYVIGEEGIRQAIEENGLTFGGENADFVWGIDR 122 

Query: 123 QVTYEMLA1ATJAIQKGALFIGTNPDLNIPTERGLMPGAGAIJSIALLEAATRVKPVFIGKP 182 

+TYE A+ LA1+ GA FI TN D+ IPTERGL+PG G+L ++L +T V+PVFIGKP 
Sbjct: 123 SITYEKFAVGCLAIRNGARFISTNGDIAIPTERGLLPGNGSLTSVLTVSTGVQPVFIGKP 182 

20 

Query: 183 NAIIMNKSLEVLGIQRSEAVMVGDNYLTDIMAGIQNDIATILVTTGFTRPEEVPTLPIQP 242 

+IIM +++ VLG SE +MVGDNY TDIMAGI + T+LV TG T+ E + +P 
Sbjct: 183 ESIIMEQAMRVLGTDVSETLMVGDNYATDIMAGINAGMDTLLVHTGVTKREHMTDDMEKP 242 

25 Query: 243 DHVLSSLDEW 252 

H + SL EW 
Sbjct: 243 THAIDSLTEW 252 

An alignment of the GAS and GBS proteins is shown below. 
30 Identities = 207/250 (82%) , Positives = 227/250 (90%) , Gaps = 1/250 (0%) 

Query: 3 YKGYLIDLDGTIYKGKSRIPAGERFIERLQEKGIPYMLVTNNTTRTPESVQEMLRG-FNV 61 

YKGYLIDLDGTIY+GK+RIPAGERFI+RLQE+GIPY+LVTNNTTRTPE VQ ML F+V 
Sbjct: 3 YKGYLIDLDGTIYQGKNRIPAGERFIKRLQERGIPYLLVTONTTRTPEMVQSMLANQFHV 62 



35 

Query: 62 ETPLETIYTATMATVDYMNDMNRGKTAYVIGEEGLKKAIADAGYVEDTKNPAYVVVGLDW 121 

ET +ETIYTATMATVDYMNDMNRGKTAYVIGE GLK AIA AGYVE+ +NPAYVWGLD 
Sbjct: 63 ETSIETIYTAT^TVDYMNDMNRGKTAYVIGETGLKSAIAAAGYVEELENPAYVWGLDS 122 

40 Query: 122 NVTYDKLATATLAIQNGALFIGTNPDLNIPTERGLLPGAGSLNALLEAATRIKPVFIGKP 181 

VTY+ LA ATLAIQ GALFIGTNPDLNIPTERGL+PGAG+LNALLEAATR+KPVFIGKP 
Sbjct: 123 QVTYEMLAIATLAIQKGALFIGTNPDLNIPTERGLMPGAGALNALLEAATRVKPVFIGKP 182 

Query: 182 NAIIMNKALEIIjNIPRNQAvMVGDNYLTDI^GINNDIDTLLVTTGFTTVEEVPDLPIQP 241 
45 NAI IMNK+LE+L I R++AVMVGDNYLTDIMAGI NDI T+LVTTGFT EEVP LPIQP 

Sbjct: 183 NAIIMNKSLEVLGIQRSEAVMVGDNYLTDIMAGIQNDIATILVTTGFTRPEEVPTLPIQP 242 

Query: 242 SYVLASLDEW 251 
+VL+SLDEW 
50 Sbjct: 243 DHVLSSLDEW 252 

A similar DNA sequence was identified in S. pyogenes <SEQ ID 4215> which encodes amino acid sequence 
<SEQ ID 4216>. An alignment of the GAS and GBS sequences follows: 

Identities = 94/204 (46%) , Positives = 139/204 (68%) 

55 

Query: 4 VMWOT^LCSVWWLIALAILITIYSTV&W^^ 63 

+M + ++ +W+W+++LA L TIY WL YP+E+ LKLE+W++ + I +N+N L+ 
Sbjct: 5 LMKDKLL WLTWIWI I SI^TLATI YIAWLIYPIEIQFLKLEKVVYLKAETI YYNFNKLMI 64 

60 Query: 64 YLTNPFVTRLEFANFHSSADGLKHFADvTOttFHLTQvVFLGLLYPTLKTFTQRLKTKRFW 123 

YLT+PF++ L +F SS DGLKHFADVK+LF L +F+ h +P + + K K + 
Sbjct: 65 YLTHPFISDLNMPSFPSSEDGLKHFADVKYLFTLAHGLFVILTFPVIYFLRRGWKQKSIF 124 



Query: 124 LLQKPLIIAALFPLMIGLMASFIGFEHFFTLFHQVLFVGDSSWLFDPLKDSVIWILPEVF 183 
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L + +A + P+ I + A +GF+ FFTLFH+VLF GDS+W F+PL D VIWILPE F 
Sbjct: 125 LYEGFFKIAIMLPIFIWCAFLLGFDQFFTLFHEVLFPGDSTWQFNPLTDPVIWILPETF 184 

Query: 184 FLHCFLFFMIVYEIILWSLVGLAR 207 
5 FLHCF+ F+++YE I L+ + R 

Sbjct: 185 FLHCFIIFLLIYETITIILLIIGR 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1378 

A DNA sequence (GBSxl463) was identified in S.agalactiae <SEQ ID 422 1> which encodes the amino 
acid sequence <SEQ ID 4222>. This protein is predicted to be oleoyl-acyl carrier protein thioesterase. 
Analysis of this protein sequence reveals the following: 

Possible site: 39 
15 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3332 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB02069 GB:AB026647 acyl carrier protein thioesterase 
[Arabidopsis thaliana] 
25 Identities = 59/248 (23%), Positives = 104/248 (41%), Gaps = 30/248 (12%) 



30 



45 



Query : 


2 


Sb j ct : 


81 


Query: 


57 


Sb j ct : 


141 


Query: 


116 


Sb j ct : 


201 


Query: 


iei 


Sb j ct : 


258 


Query: 


221 


Sb j ct : 


312 



GLLYRETYEVPFYESDTNHYMKLPQLLALALQISAKQSLKLGIGDD IVFKRYGLV 5 6 

GL Y+E + V YE +N + + L ++ + +G D ++ L+ 

GLSYKEKFWRSYEVGSNKTATVETI ANLLQEVGCNHAQSVGFSTDGFATTTTMRKLHLI 140 

WVVTDYIIDIERLPKHAEKIVIETEAKAHNKLLCYRYFYIYGE-DGQKIITISSAFVLMD 115 
WV I + I + P + + IET ++ ++ R + + G+ +S +V+M+ 



35 Query: 116 FKTRKIHPVLDDITSIY QSQRIKKVIRGPKYHPIGDSKVKQYHVR 160 

TR++ V DD+ Y ++ +KK+ PK + R 

DTRRLQKVSDD VRDEYLVFCPQEPRLAFPEENNRSLKKI - - - PKLEDPAQYSMIGLKPR 257 



40 DLDMN HVNN Y+W+ + ++ + +H+ILY+EQ + DL 

ILDMNQHVNNVTYIGWVLESIPQEIVDTHELQVITLDYRRECQQDDVV DSLT 311 



IGG 



A related DNA sequence was identified in S. pyogenes <SEQ ID 4223> which encodes the amino acid 
sequence <SEQ ID 4224>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
50 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.88 Transmembrane 21 - 37 ( 21 - 38) 

Final Results 

bacterial membrane Certainty=0. 2550 (Affirmative) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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>GP:AAB71730 GB:U65643 acyl-ACP thioesterase [Myristica fragrans] 
Identities = 41/128 (32%) , Positives = 67/128 (52%) , Gaps = 11/128 (8%) 

Query: 33 FIFMIKRGGLLVDILAYFALLNPDTRKVATIPEDLVAPFETDFVKKLHRV PKMPL 87 

5 F+ K G +L + + ++N TR+++ IPE++ E FV+ H V K+P 

Sbjct: 147 FLRDCKTGE I LTRATS VWVMMNKRTRRLSKI PEEVR VEIEPYFVE - -HG VLDEDSRKLPK 204 

Query: 88 LEQS IDRDYYTOYFDIDMNGHVlSmSKyLDWMYDVLGCEFLKTHQPLKMTLKYVKEV 143 

L + I R R+ D+D+N HVNN KY+ W+ + + L++H+ MTL+Y KE 
10 Sbjct: 205 I^OTMtYIRRGLAPRWSDLDWQHWNvTCYIGWILESVPSSLLESHELYGMTLEYRKEC 264 

Query: 144 SPGGQITS 151 
G + S 

Sbjct: 265 GKDGLLQS 272 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 62/144 (43%) , Positives = 94/144 (65%) 

Query: 101 GQKIITISSAFVLMDFKTRKIHPVLDDITSIYQSQRIKKVIRGPKYHPIGDSKVKQYHVR 160 
20 G ++ I + F L++ TRK+ + +D+ + +++ +KK+ R PK + S + Y+VR 

Sbjct: 40 GGLLVIlIIAYFALLNPDTRKVATIPEDLVAPFETDFVKKLHRVPKMPIiLEQSIDRDYYVR 99 

Query: 161 YFDLDMNGHVNNSKYLEWMYDVLDLDFLSSHIPKKIDLKYIKEIQYGTDIKSHWYQDGLV 220 
YFD+DMNGHVNNSKYL+WMYDVL +FL +H P K+ LKY+KE+ G I S ++ D L 
25 Sbjct: 100 YFDIDMNGHVl^SKYLDWMnDvLGCEFLKTHQPIiKMTLKyVKEVSPGGQITSSYHLDQLT 159 

Query: 221 TRHDI IGGDAIHAQARIEWQEKKE 244 

+ H I ++AQA IEW+ K+ 

Sbjct: 160 SYHQITSDGQLNAQAMIEWRAIKQ 183 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1379 

A DNA sequence (GBSxl464) was identified in S.agalactiae <SEQ ID 4225> which encodes the amino 
35 acid sequence <SEQ ID 4226>. This protein is predicted to be coproporphyrinogen III oxidase. Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05062 GB:AP001511 coproporphyrinogen III oxidase [Bacillus halodurans] 
Identities = 173/375 (46%) , Positives = 248/375 (66%) , Gaps = 5/375 (1%) 

Query: 5 PTSAYVHIPFCTQICYYCDFSKVFIKNQPVDAYLQALIREFR SYDITELRTLYIGG 60 

50 P +AY+HIPFC ICYYCDF+K ++KNQPV+ YLQAL E L+TLY+GG 

Sbjct: 2 PKAAYIHIPFCEHICYYCDFNKFYLKNQPVNEYLQALETEMAMVVAEQPTKSLQTLYVGG 61 

Query: 61 GTPTSISAVQLDYLLTELSRDLNLNTLEEFTIFANPGDLTVDKIEVLQKSAVNRVSLGVQ 120 
GTPT+++A QL LL + R L L+ LEEFT E NP + +K++VL+ V+R+S+GVQ 
55 Sbjct: 62 GTPTALTADQLAQLLAS I KRTLPLSDLEEFTFEvNPDS IDEEKLD VLRSYGVDRLS IGVQ 121 

Query: 121 TFNDKHLKRIGRSHNEAQIYSTIDALKTAGFQNISIDLIYALPGQTMDDVRSNVAKALSL 180 

F LK IGR+H++ + ++ + AGF N+S+DL+ LP QT + + +A +L 

Sbjct: 122 AFQPLLLKEIGRTHDQKSVEQAVEKSRQAGFANLSLDLMLGLPKQTPEMFAETLKEAFAL 181 
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Query: 181 NI PHLSLYSL I LEHHTVFMNKMRRGKLHLPTEDLEAEMFEYI I SEMERNGFEHYE I SNFT 240 

+ HLS YSL +E TVF N+ R+G+L LP ED E +M+ + E E++GF+ YEISNF 
Sbjct: 182 EVEHLSCYSLKVEAKTVFYNRQRQGRLTLPPEDDEVKMYRQLCYETEKHGFKQYEISNFA 241 

5 Query: 241 KPGFESRHmMYWDNVEYYGVGAGASGYLDGIRYRNRGPIQHYLKGVSEGNARLSE-EVL 299 

K G+ESRHNL+YW+N EYYG GAGA GY+ G+RY N GP+ YL+ + EG + E + 
Sbjct: 242 KKGYESRHNLVYWNNDEYYGFGAGAHGYVGGVRYMNHGPLPKYLQAMEEGRRPVFESHHV 301 

Query: 300 SKNEMMEEELFLGLRKKEGVSIGKFEQKFGTSFEKRYGQIVQELQSDGLLKENNGFIQMT 359 
10 S+ E MEE++FLGLRK+ GV F ++FG S Y + + +L + LL+ + +++T 

Sbjct: 302 SRVEQMEEQMFLGLRKRSGVEERVFVERFGVSMFSLYEKQIAQLVARCLLERTDDRVRLT 361 

Query: 360 KKGLFLGDTVAEKFI 374 
+GL LG+ V E+F+ 
15 Sbjct: 362 DEGLLLGNEVFEQFL 376 

A related DNA sequence was identified , in S.pyogenes <SEQ ID 4227> which encodes the amino acid 
sequence <SEQ ID 4228>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 3202 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 304/376 (80%) , Positives = 343/376 (90%) 

30 Query: 1 MLKKPTSAYVHIPFCTQICYYCJDFSKVFIKNQPVDAYLQALIREFRSYDITELRTLYIGG 60 

M KKPTSAYVHIPFCTQICYYCDFSKVFI+NQPVDAYL+ALI+EF SY I +L+TLYIGG 
Sbjct: 33 MSKKPTSAYVHIPFCTQICYYCDFSKVFIQNQPVDAYLKALIQEFDSYGIRDLKTLYIGG 92 

Query: 61 GTPTSISAVQLDYLLTELSRDLNLNTLEEFTIEANPGDLTVDKIEVLQKSAVNRVSLGVQ 120 
35 GTPT+I+A QL+YLL L R+WL+ LEEFTIEANPGDLT +KI VLQ+SAVNR+SLGVQ 

Sbjct: 93 GTPTAITAKQLEYLLNHLERNIiNLDDIjEEFTIEANPGDLTPEKIAVLQRSAVNRISLGVQ 152 

Query: 121 TFNDKHLKRIGRSHNEAQIYSTIDALKTAGFQNISIDLIYALPGQTMDDVRSNVAKALSL 180 
TFN+K LK+IGRSHNE QIYSTI LKTAGF NISIDLIYALPGQT+D V+ NVAKAL+L 
40 Sbjct: 153 TFNNKQLKQIGRSHNEEQIYSTIANLKTAGFHNISIDLIYALPGQTLDQVKENVAKALAL 212 

Query: 181 NIPHLSLYSLILEHHTVFMNKMRRGKLHLPTEDLEAEMFEYIISEMERNGFEHYEISNFT 240 

+1 PHLSLYSLILEHHTVFMNKMRRGKL+LPTEDLEAEMFEYI I SEME NGFEHYEISNFT 
Sbjct: 213 DIPHLSLYSLILEHHTVFMNKMRRGKLNLPTEDLEAEMFEYIISEMEANGFEHYEISNFT 272 

45 

Query: 241 KPGFESRHNLMYWDNVEYYGVGAGASGYLDGIRYRNRGPIQHYLKGVSEGNARLSEEVLS 300 

KPGFESRHNLMYWDNVEY+G GAGASGYL+GIRY+NR PIQHYLK V GNARL+EEVL 
Sbjct: 273 KPGFESRHNLMYWDNVEYFGCGAGASGYOTGIRYQNRVPIQHYLKAVEAGNARIiNEEVLR 332 

50 Query: 301 KNEMMEEELFLGLRKKEGVSIGKFEQKFGTSFEKRYGQIVQELQSDGLLKENNGFIQMTK 360 

K EMMEEELFLGLRKK GVSI +F++KFG SFE+RYG IV+ELQ+ GLL +++ F++MTK 
Sbjct: 333 KEEMMEEELFLGLRKKTGVSIQRFQEKFGMSFEERYGNIVRELQNQGLLVKDDAFVRMTK 392 

Query: 361 KGLFLGDTVAEKFIVE 376 
55 KGLFLGD+VAE+FI++ 

Sbjct: 393 KGLFLGDSVAERFILD 408 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



-1513- 



PCT/GB01/04789 



Example 1380 

A DNA sequence (GBSxl465) was identified in S.agalactiae <SEQ ID 4229> which encodes the amino 
acid sequence <SEQ ID 4230>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3729 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1381 

A DNA sequence (GBSxl466) was identified in S.agalactiae <SEQ ID 423 1> winch encodes the amino 
acid sequence <SEQ ID 423 2>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2989 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4233> which encodes the amino acid 
sequence <SEQ ID 4234>. Analysis of this protein sequence reveals the following: 

30 Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2993 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 36/109 (33%) , Positives = 58/109 (53%) , Gaps = 6/109 (5%) 

40 

Query: 9 WAKHKYLVLSKSQKIYLDIRQTLKSPNCT VLDVQSLIDQAVLLEESPSQVTNAYMHI 65 

WA KY V++ SQ+ Y +R+ K + VL LI++A + + + AY H+ 
Sbjct: 13 WAYQKYWVMAHSQQHYNALRELFKGNQWSEEKVLTFHC3jIEFAQAIPPWKSIjRTAYQHV 72 

45 Query. 66 WGYFKNKAERQEKEEFLTLLEKYRKTGYQRRKLLAFLKQLIAKYPNSYL 114 
WGYFK A ++EK+ F L + + ++L FL+++ A Y SYL 

Sbjct: 73 WGYFKKVASQEEKDHFKDLDAQLET KSEEMLCFLQEMTAHYQPSYL 118 



50 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1382 

A DNA sequence (GBSxl467) was identified in S.agalactiae <SEQ ID 4235> which encodes the amino 
acid sequence <SEQ ID 423 6>. This protein is predicted to be mrsA (mrsA). Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 56 - 72 ( 56 - 72) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11970 GB:Z99105 similar to phosphoglucomutase (glycolysis) 
[Bacillus subtilis] 
Identities = 284/451 (62%), Positives = 353/451 (77%), Gaps = 4/451 (0%) 

Query: 1 MGKYFGTDGVRGEANVELTPELAFKLGRFGGYVLSQHETDRPRVFVARDTRISGEMLESA 60 

MGKYFGTDGVRG AN ELTPELAFK+GRFGGYVL++ + RP+V + RDTRISG MLE A 
Sbjct: 1 MGKYFGTDGVRGVANSELTPELAFKVGRFGGYVLTK-DKQRPKVLIGRDTRISGHMLEGA 59 

Query: 61 LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFGSDGFKL 120 

L+AGLLS+G EV +LGV++TPGVSYL + A AG VMI SASHNP DNGIKFFG DGFKL 
Sbjct: 60 LVAGLLSIGAEVMRLGVISTPGVSYLTKAMDAEAGVMISASHNPVQDNGIKFFGGDGFKL 119 

Query: 121 DDDRELEIEALLDAKEDTLPRPSAQGLGTLVDYPEGLRKYEKFMESTGI-DLEGMKVALD 179 

D++E EIE L+D ED LPRP LG + DY EG +KY +F++ T D G+ VALD 
Sbjct: 120 SDEQEAEIERLMDEPEDKLPRPVGADLGLVNDYFEGGQKYLQFLKQTADEDFTGIHVALD 179 

Query: 180 TANGAATASARNIFLDLNADISVIGDQPDGIjNINDGVGSTHPEQLQSLVRENGSDIGIiAF 239 

ANGA ++ A ++F DL+AD+S +G P+GLNINDGVGSTHPE L + V+E +D+GLAF 
Sbjct: 180 CANGATSSIATHLFADLDADVSTMGTSPNGLNINDGVGSTHPF^SAFVKEKNADLGLAF 239 

Query: 240 DGDSDRLIAVDENGEIVDGDKIMFIIGKYLSDKGQLAQNTIVTTVMSNLGFHKALDREGI 299 

DGD DRLIAVDE G IVDGD+IM+I K+L +G+L +T+V+TVMSNLGF+KAL++EGI 
Sbjct: 240 DGDGDRLIAvDEKGNIVDGDQIMYICSKHLKSEGRLKDDTWSTVMSNLGFYKALEKEGI 299 

Query: 300 HKAITAVGDRYVVEEMRKSGYNLGGEQSGHVIIMDYNTTGDGQLTAIQLTKVMKETGKKL 359 

TAVGDRYWE M+K GYN+GGEQSGH+I +DYNTTGDG L+AI L +K TGK L 
Sbjct: 300 KSVQTAVGDRYWEAMKKDGYNVGGEQSGHLIFLDYNTTGDGDLSAIMLMNTLKATGKPL 359 

Query: 360 SELASEVTIYPQKLWIRWNNMKDKAMEVPA1AEIIAKMEEEMDGNGRILVRPSGTEPL 419 

SELA+E+ +PQ LVN+RV + K K E + +I+++E+EM+G+GRILVRPSGTEPL 
Sbjct: 360 SELAAEMQKFPQLLYNVRVTD- -KYKVEENEKVKAVISEVEKEMNGDGRILVRPSGTEPL 417 

Query: 420 LRvMAEAPTNEAVDYYVDTIADWRTEIGLD 450 

+RVMAEA TED YV+ I +WR+E+GL+ 
Sbjct: 418 VRVMAEAKTKELCDEYVNRI VEWRSEMGLE 448 

A related DNA sequence was identified in S. pyogenes <SEQ ID 423 7> which encodes the amino acid 
sequence <SEQ ID 423 8>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 56 - 72 ( 56 - 72) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 13 83 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1383 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB11970 GB:Z99105 similar to phosphoglucomutase (glycolysis) 
[Bacillus subtilis] 
Identities = 287/451 (63%) , Positives = 346/451 (76%) , Gaps = 4/451 (0%) 

5 

Query. 1 MGKYFGTDGTOGEANvELTPEIAFKMRFGGYVIiSQHETERPKVFVARDTRISGEMLESA 60 

MGKYFGTDGVRG AN ELTPELAFK+GRFGGYVL++ + +RPKV + RDTRISG MLE A 
Sbjct: 1 MGKYFGTDGVRGVANSELTPELAFJCVGRFGGYVLTK-DKQRPKVLIGRDTRISGHMLEGA 59 

10 Query: 61 LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFGNDGFKL 120 

L+AGLLS+G EV +LGV++TPGVSYL + A AG VMI SASHNP DNGIKFFG DGFKL 
Sbjct: 60 LVAGLLSIGAEVMRLGVISTPGVSYLTKAMDAEAGVMISASHNPVQDNGIKFFGGDGFKL 119 

Query: 121 ADDQELEIEALLDAPEDTLPRPSAEGLGTLvDYPEGLRKyEKFLVTTGT-DLSGMTVALD 179 
15 +D+QE EIE L+D PED LPRP LG + DY EG +KY +FL T D +G+ VALD 

Sbjct: 120 SDEQEAEIERLMDEPEDKLPRPVGADLGLvNDYFEGGQKYLQFIiKQTADEDFTGIHVALD 179 

Query: 180 TANGAASVSARDVFLDLNAEIAVIGEKPNGLNINDGVGSTRPEQLQELVKETGADLGLAF 239 
ANGA S A +F DL+A+++ +G PNGLNINDGVGST PE L VKE ADLGIoAF 
20 Sbjct: 180 CANGATSSIATHLFADLDADVSTMGTSPNGLNINDGVGSTHPEALSAFVKEKNADLGLAF 239 

Query: 240 DGDSDRLIAvBETGEI VDGDRIMFIIGKYLSEKGLIAHNTI VTTVMSNLGFHKALDKQGI 299 

DGD DRLIAVDE G IVDGD+IM+I K+L +G L +T+V+TVMSNLGF+KAL+K+GI 
Sbjct: 240 DGDGDRLIAVDEKGNIVDGDQIMYICSKHLKSEGRLKDDTWSTVMSNLGFYKALEKEGI 299 

25 

Query: 300 NKAITAVGDRYVVEEMRSSGYNLGGEQSGHVIimYNTTGDGQLTAIQLAKVMKETGKSL 359 

TAVGDRYWE M+ GYN+GGEQSGH+I +DYNTTGDG L+AI L +K TGK L 
Sbjct: 300 KSVQTAVGDRYWEAMKKDGYNVGGEQSGHLI FLDYNTTGDGLLSAI MLMNTLKATGKPL 359 

30 Query: 360 SELAAEVTIYPQKLVNIRVENSMKERAMEVPAIANIIAKMEDEMAGNGRILVRPSGTEPL 419 

SELAAE+ +PQ LVN+RV + K ,+ E + +I+++E EM G+GRILVRPSGTEPL 
Sbjct: 360 SELA7ffiMQKFPQLLVlSlVRVTD--KYKVEENEKVKAVISEvEKEMNGDGRILvRPSGTEPL 417 

Query: 420 LRVMAEAPTDAEVDYYvDTIADVVRTEIGCD 450 
35 +RVMAEA T D YV+ I +WR+E+G + 

Sbjct: 418 VRVMAEAKTKELCDEYVNRI VEWRSEMGLE 448 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 400/450 (88%) , Positives = 429/450 (94%) 

40 

Query: 1 MGKYFGTDGTOGEANVELTPEIAFKLGRFGGYVLSQHETDRPRVFVARDTRISGEMLESA 60 

MGKYFGTDGVRGEANVELTPELAFKLGRFGGYVLSQHET+RP+VFVARDTRISGEMLESA 
Sbjct: 1 MGKYFGTDGvRGEANVELTPELAFKLGRFGGYVLSQHETERPKVFVARDTRISGEMLESA 60 

45 Query: 61 LIAGLLSVGIEVYKLG VLATPGVSYL VRTEKASAG VMI SASHNPALDNGI KFFGSDGFKL 120 

LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGvMISASHNPALDNGIKFFG+DGFKL 
Sbjct: 61 LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFGNDGFKL 120 

Query: 121 DDDRELEIEALLDAKEDTLPRPSAQGLGTLVDYPEGLRKYEKFMESTGIDLEGMKVALDT 180 
50 DD+ELEIEALLDA EDTLPRPSA+GLGTLVDYPEGLRKYEKF+ +TG DL GM VALDT 

Sbjct: 121 ADDQELEIEALLDAPEDTLPRPSAEGLGTLVDYPEGLRKYEKFLVTTGTDLSGMTVALDT 180 

Query: 181 ANGAATASARNIFLDLNADISVIGDQPDGLNINDGVGSTHPEQLQSLvRENGSDIGLAFD 240 
ANGAA+ SAR+ + FLDI1NA+ 1 +VIG+ +P+GLNINDGVGST PEQLQ LV+E G+D+GLAFD 
55 Sbjct: 181 ANGAASVSARDVFLDI^AEIAVIGEKPNGLNINDGVGSTRPEQLQELVKETGADLGLAFD 240 

Query: 241 GDSDRLIAvDENGEIvDGDKIMFIIGKYLSDKGQLAQNTIVTTVMSNLGFHKALDREGIH 300 

GDSDRLIAVDE GEIVDGD+IMFIIGKYLS+KG LA NTIVTTVMSNLGFHKALD++GI+ 
Sbjct: 241 GDSDRLIAVDETGEIVDGDRIMFIIGKYLSEKGLIjAHNTI VTTvMSNLGFHKALDKQGIN 300 

60 

Query: 301 KAITAVGDRYVVEEMRKSGYKLGGEQSGHVIIMDYNTTGDGQLTAIQLTKVMKETGKKLS 360 

KAITAVGDRYWEEMR SGYNLGGEQSGHVI IMDYNTTGDGQLTAIQL KVMKETGK LS 
Sbjct: 301 KAITAVGDRYWEEMRSSGYNLGGEQSGHVIIMDYNTTGDGQLTAIQLAKVMKETGKSLS 360 

65 Query: 361 ELASEVTIYPQKLVNIRvENNMKDKAMEVPAIAEIIAKMEEEMDGNGRILVRPSGTEPLL 420 

ELA+EVTIYPQKLVNIRVEN+MK++AMEVPAIA IIAKME+EM GNGRILVRPSGTEPLL 
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Sbjct: 361 ELAAEVTIYPQKLVNIRVENSMKERAMEVPAIANIIAKMEDEMAGNGRILWPSGTEPLL 420 

Query: 421 RVMAEAPTNEAVDYYVDTIADWRTEIGLD 450 
RVMAEAPT+ VDYYVDTIADWRTEIG D 
5 Sbjct: 421 RVMAEAPTDAEVDYYVOTIADWRTEIGCD 450 

SEQ ID 4236 (GBS402) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 84 (lane 5; MW 78kDa). 

GBS402-GST was purified as shown in Figure 218, lane 3-5. 

10 Based on this analysis, it was predicted that tiiese proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1383 

A DNA sequence (GBSxl468) was identified in S.agalactiae <SEQ ID 4239> which encodes the amino 

acid sequence <SEQ ID 4240>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11969 GB:Z99105 ybbR [Bacillus subtilis] 
25 Identities = 90/324 (27%) , Positives = 167/324 (50%) , , Gaps = 18/324 (5%) 

. , Query: 1 MKKFFTNKFWLGWSLFLAILLFLTATATSMNHQDNSKIAG ASETYTHTLTDVPI 55 

M KF N++ + +++L A+LL++ A + N KG ST TLTD+P+ 

Sbjct: 1 MDKFLNNRWAVKIIALLFALLLYV AVNSNQAPTPKKPGESFFPTSTTDEATLTD1PV 57 

Query: 56 DIKyDSDDYFISGYSYGADVYMS-SVNRVKLDSEINEDTRKFKWADLTNMKPGTHKVPL 114 

YD ++Y ++G +V + S + VK + T+ F++ AD+ ++K GTHKV L 
Sbjct: 58 KAYYDDENYVVTGVPQTVNvT I KGSTSAVKKARQ TKNFE I YADMEHLKTGTHKVEL 113 

35 Query: 115 KVWLPSGWAWSPTTITOTMGKKKTKEFPV-YGHVNDKQIKAGYAVDKMSVDVSKVKV 173 

K N+ G+ +++P+ TVT+ ++ TK FPV + N ++K GY+ ++ V V++ 
Sbjct: 114 KAKNVSDGLTISINPSOTTVT1QERTTKSFPVEVEYYNKSKMKKGYSPEQPIVSPKNVQI 173 

Query: 174 TSDESIIDRIDWAANIPDDKVIjDDDFNKTvTIiQAVTADGTVLASIIHPSKATLSVKVKK 233 
40 T +++ID I A++ + D+ K + DG L + PS ++V V 

Sbjct: 174 TGSKWIDNISLHKASVNLENA-DETIEKEAKVTVYDKDGNALPVIIVEPSVIKITVPVTS 232 

Query: 234 LTKTVPINLIPVGQFSDSISKINYKLSQEKAVISGTKEALEAISVIN-AEVDISDVTKNT 292 
+K VP + G D +S N + S + + G+++ L+++ 1+ +D+S + K++ 
45 Sbjct: 233 PSKKVPFKIERTGSLPDGVSIANIESSPSEVTVYGSQDVLDSLEFIDGVSLDLSKINKDS 292 

Query: 293 --EKKINLSANNVSVDPAQVTVQL 314 

E I L + P++VT+ + 

Sbjct: 293 DIEADIPLPDGVKKISPSKVTLHI 316 

50 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4241> which encodes the amino acid 
sequence <SEQ ID 4242>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
55 »> Seems to have a cleavable N-term signal seq. 



30 



Final Results 
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bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

>GP:CAB11969 GB:Z99105 ybbR [Bacillus subtilis] 
Identities = 81/322 (25%) , Positives = 154/322 (47%) , Gaps = 15/322 (4%) 

Query: 1 MKRFLNSRPWLGMVSVFFAILLFLTAASSNH NNSSSQIYSPIETYTHSLKDVPIDM 56 

10 M +FLN+R + ++++ FA+LL++ A +SN + T +L D+P+ 

Sbjct: 1 MDKFLNNRWAVKIIALLFALLLW-AVNSNQAPTPKKPGESFFPTSTTDFATLTDIPVKA 59 

Query: 57 KYDSDKYFISGYSYGAEWLT-STNRIKLDSEVNNDTRNFKIVADLTHSHPGTVSVNLRV 115 
YD + Y ++G V + ST+ +K + T+NF+I AD+ H GT V L+ 
15 Sbjct: 60 YYDDENYVVTGVPQTVNVTIKGSTSAVKKARQ TKNFEIYADMEHLKTGTHKVELKA 115 

Query: 116 ENLPSGVTATVSPDKISVTIGKKESKVFPVRGS-VDAKQIANGYEISKIETGvNKVEVTS 174 

+N+ G+T +++P +VTI ++ +K FPV + ++ GY + V++T 

Sbjct: 116 KNVSDGLTISINPSOTTVTIQERTTKSFPVKVEYYNKSKMKKGYSPEQPIVSPKNVQITG 175 

20 

Query: 175 DESTIALIDHWAKLPDDQVLDRNYSSRVTLQAVSADGTILASAIDPAKTNLSVAVKKIT 234 

A++D + DGL ++P+ ++V V + 

Sbjct: 176 SKNVIDNISLHKASWLENA-DETIEKFJUOTTVYDKDGNALPVDVEPSVIKITVPVTSPS 234 

25 Query: 235 KSVPIRVEAVGMMDDSLSDIQYKLSKQTAVISGSREVLEDIDEII-AEVNISDVTKNT-- 291 

K VP ++E G + D +S + S + GS++VL+ ++ I +++S + K++ 

Sbjct: 235 KKVPFKIERTGSLPDGVSIANIESSPSEVTVYGSQDVLDSLEFIDGVSLDLSKINKDSDI 294 



Query: 292 SKTVSLSSSQVSIEPSWTVQL 313 
30 + L I PS VT+ + 

Sbjct: 295 EADIPLPDGVKKISPSKVTLHI 316 



35 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 198/319 (62%) , Positives = 251/319 (78%) , Gaps = 1/319 (0%) 

Query: 1 MKKFFTNKFWLGWSLFLAILLFLTATATSMNHQDNSKIAGASETYTHTLTDVPIDIKYD 60 

MK+F ++ WLG+VS+F AILLFLTA A+S ++ +S+I ETYTH+L DVPID+KYD 
Sbjct: 1 MKRFLNSRPWLGMVSVFFAILLFLTA-ASSNHNNSSSQIYSPIETYTHSLKDVPIDMKYD 59 

40 Query: 61 SDDYFI SGYS YGAD VYMS S VNRVKLDSE INEDTRKFKWADLTNMKPGTHKVPLKWNLP 120 

SD YFISGYSYGA+VY++S NR+KLDSE+N DTR FK+VADLT+ PGT V L+V NLP 
Sbjct: 60 SDKYFISGYSYGAEWLTSTNRIKLDSEVNNDTRNFKIVADLTHSHPGTVSVNLRVENLP 119 

Query: 121 SGTOAWSPTTITVTMGKKKTKEFPVYGHVNDKQIKAGYAVDKMSVDVSKVKVTSDESII 180 
45 SGV ATVSP I+VT+GKK++K FPV G V+ KQI GY + K+ V+KV+VTSDES I 

Sbjct: 120 SGWAWSPDKISVTIGKKESKVFPVRGSVDAKQIANGYEISKIETGVNKVEVTSDESTI 179 

Query: 181 DRIDHVAANIPDDKVLDDDFNKTVTLQAOTADGTVLASIIHPSKATLSVKVKKLTKTVPI 240 
IDHV A +PDD+VLD +++ VTLQAV+ADGT+LAS I P+K LSV VKK+TK+VPI 
50 Sbjct: 180 ALIDHWAKLPDDQVLDRNYSSRVTLQAVSADGTILASAIDPAKTNLSVAVKKITKSVPI 239 

Query: 241 NLIPVGQFSDS ISKINYKLSQEKAVI SGTKEALEAI SVINAEVDI SDVTKNTEKKINLSA 300 

+ VG DS+S I YKLS++ AVISG++E LE I I AEV+ I SDVTKNT K ++LS+ 
Sbjct: 240 RVEAVGMMDDSLSDIQYKLSKQTAVISGSREVLEDIDEIIAEvNISDVTKNTSKTVSLSS 299 



55 



Query: 301 NNVSVDPAQVTVQLTTTKK 319 

+ VS++P+ VTVQLTTTKK 
Sbjct: 300 SQVSIEPSWTVQLTTTKK 318 



60 SEQ ID 4240 (GBS99) was expressed in E.coli as a ffis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 6; MW 35.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 9; MW 60.7kDa). 
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The GBS99-GST fusion product was purified (Figure 197, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 293), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1384 

A DNA sequence (GBSxl469) was identified in S.agalactiae <SEQ ID 4243> which encodes the amino 
acid sequence <SEQ ID 4244>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0503 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1385 

A DNA sequence (GBSxl470) was identified in S.agalactiae <SEQ ID 4245> which encodes the amino 
acid sequence <SEQ ID 4246>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
25 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.50 Transmembrane 20 - 36 ( 18 - 46) 
INTEGRAL Likelihood = -7.64 Transmembrane 48 - 64 ( 42 - 68) 
INTEGRAL Likelihood = -3.40 Transmembrane 80 - 96 ( 80 - 96) 

30 Final Results 

bacterial membrane Certainty=0 .4800 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11968 GB:Z99105 alternate gene name: ybbQ-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 125/253 (49%), Positives =186/253 (73%), Gaps = 5/253 (1%) 

40 Query. 27 MDIIIVAVLIYKFIKAIAGTKIMSLIQGVILFIIIRFVSEWIGLTTITFLMNQIVTYGVI 86 

+DI++V +IYK I + GTK + L++G+++ +++R S+++GL+T+ +LM+Q +T+G + 
Sbjct: 16 VDILLvWWIYKLIMVIRGTKAVQLLKGIWIVLVRMASQYLGLSTLQWLMDQAITWGFL 75 

Query: 87 AGWIFAPEIRTGLEKFGRTPQLFTQRSQLSSDE KLVDALVKAVAYMSPRKIGALIS 143 

45 A ++IF PE+R LE+ GR F RS +E K ++A+ KA+ YM+ R+IGAL++ 

Sbjct: 76 AI I I IFQPELRRALEQLGRGR- - FFSRSGTPVEEAQQKTIEAITKAINYMAKRRIGALLT 133 

Query: 144 IERTQTLQEYIATGIPLDADISSELLINIFIPNTPLHDGAVIVKDKKIATACSYLPLSES 203 
IER + +YI TGIPL+A +SSELLINIFI PNTPLHDGAVI +K+ +IA A YLPLSES 
50 Sbjct: 134 IERDTGMGDYIETGIPLNAKVSSELLINIFIPNTPLHDGAVIMKNNEIAAAACYLPLSES 193 

Query: 204 SSISKEFGTRHRAAIGLSENSDALTVIVSEETGGISVALKGEFLHDLSKDSFEAILRTQL 263 
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ISKE GTRHRAA+G+SE +D+LT+IVSEETGG+SVA G+ +L++++ + +L + 
Sbjct: 194 PFISKELGTRHRAAVGISEVTDSLTIIVSEETGGVSVAKNSDLHRELTEEALKEMLEAEF 253 

Query: 264 IQNQEENSKLAWY 276 
5 +N + S WY 

Sbjct: 254 KKNTRDTS SNRWY 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4247> which encodes the amino acid 
sequence <SEQ ID 4248>. Analysis of this protein sequence reveals the following: 

10 Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.64 Transmembrane 20 - 36 { 19 - 40) 
INTEGRAL Likelihood = -6.21 Transmembrane 48 - 64 ( 47 - 68) 
INTEGRAL Likelihood = -2.07 Transmembrane 76 - 92 { 76 - 92) 

15 

Final Results 

bacterial membrane Certainty=0 . 3654 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:RAB03984 GB:AP001507 unknown conserved protein [Bacillus halodurans] 
Identities = 117/255 (45%) , Positives = 178/255 (68%) , Gaps = 6/255 (2%) 

25 Query: 19 PWL-LAVHLLDILIVAYLIYRFIKALTGTKIMSLVQGVIFFLVLRFIAEWIGFTTITYLM 77 

PWL +LDIL+V Y+IY+ I + GT+ + L++G+ L++ 1+ + T+ +++ 

Sbjct: 8 PWLNYLTQILDILVVTYVIYKAIMIIRGTRAVQLLRGITVILIVYAISIFFNLRTLGWIV 67 

Query: 78 NQVITYGVIAGWIFTPEIRAGLEKFGRSTQVFLQKQYVSSESAL VDALIKSVAYMG 134 

30 NQ ITYG++A ++IF PE+R LE+ GR F + + E + +DA++K+ YMG 

Sbjct: 68 NQAITYGLLAVIIIFQPELRRALEQLGRGR--FFASRTANEEETMKKTIDAIVKASTYMG 125 

Query: 135 PRKIGALIAIEQTQTLQEYIATGIPLNADISSQLLINIFIPNTPLHDGAVIVGQNKIVAA 194 
R+IGALI++E+ + +Y+ TGI P+NA+++S+LLIN FIPNTPLHDGAVI+ + I+AA 
35 Sbjct: 126 KRRIGALISMERETGMTDYVETGIPMNANLTSELLINTFIPNTPLHDGAVIINNDTILAA 185 

Query: 195 CAYLPLSESKAISKEFGTRHRAAIGLSENSDALTIIVSEETGAISVTRKGQFLHDLSTDE 254 

YLPLSE+ ISKE GTRHRAA+G+SE +D LTI+VSEETG IS+T+ G+ DL ++ 
Sbjct: 186 ACYLPLSENPFISKELGTRHRAALGVSEVTDCLTI VVSEETGHISLTKNGELHRDLDEEQ 245 

40 

Query: 255 FETVLRTYLMSNSNV 269 

++L L+S + + 
Sbjct: 246 LRSLLEAELISEAKM 260 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/283 (71%) , Positives = 239/283 (84%) , Gaps = 2/283 (0%) 

Query: 1 MDIFSAIDSKFWASIMENPWMILIHLMDIIIVAVLIYKFIKALAGTKIMSLIQGVILFII 60 
M+ S+ID KF S+ +PW++ +HL+DI+IVA LIY+FIKAL GTKIMSL+QGVI F++ 
50 Sbjct: 1 MNNLSSIDIKFLLSLFADPWLLAVHLLDILIVAYLIYRFIKALTGTKIMSLVQGVIFFLV 60 

Query: 61 IRFVSEWIGLTTITFLMNQIVTYGVIAGWIFAPEIRTGLEKFGRTPQLFTQRSQLSSDE 120 

+RF++EWIG TTIT+LMNQ++TYGVIAGWIF PEIR GLEKFGR+ Q+F Q+ +SS+ 
Sbjct: 61 LRFIAEWIGFTTITYLMNQVITYGVIAGWIFTPEIRAGLEKFGRSTQVFLQKQYVSSES 120 



55 

Query: 121 KLVDALVKAVAYMSPRKIGALISIERTQTLQEYIATGIPLDADISSELLINIFIPNTPLH 180 

LVDAL+K+VAYM PRKIGALI+IE+TQTLQEYIATGIPL+ADISS+LLINIFIPNTPLH 
Sbjct: 121 ALVDALIKSVAYMGPRKIGALIAIEQTQTLQEYIATGIPLNADISSQLLINIFIPNTPLH 180 

60 Query: 181 DGAVIVKDKKIATACSYLPLSESSSISKEFGTRHRAAIGLSENSDALTVIVSEETGGISV 240 

DGAVIV KI AC+YLPLSES +ISKEFGTRHRAAIGLSENSDALT+I VSEETG ISV 
Sbjct: 181 DGAVIVGQNKIVAACAYLPLSESKAISKEFGTRHRAAIGLSENSDALTIIVSEETGAISV 240 



Query: 241 ALKGEFLHDLSKDSFEAILRTQLIQNQEENSKLAWYNQLLRRK 283 
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KG+FLHDLS D FE +LRT L+ N N L WY ++L K 
Sbjct: 241 TRKGQFLHDLSTDEFETVLRTYLMSN--SNVTLPWYKKILGGK 281 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
5 vaccines or diagnostics. 

Example 1386 

A DNA sequence (GBSxl471) was identified in S.agalactiae <SEQ ID 4249> which encodes the amino 
acid sequence <SEQ ID 4250>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
10 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 33 - 49 ( 33 - 49) 

Final Results 

bacterial membrane — Certainty=0. 2041 (Affirmative) < suco 

15 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1387 

A DNA sequence (GBSxl472) was identified in S.agalactiae <SEQ ID 4251> which encodes the amino 
acid sequence <SEQ ID 4252>. Analysis of this protein sequence reveals the following: 

25 Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1001 (Affirmative) < suco 

30 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9781> which encodes amino acid sequence <SEQ ID 9782> 
was also identified. 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC84012 GB:AF080002 UDP-N-acetylmuramyl tripeptide synthetase 
MurC [Heliobacillus mobilis] 
Identities = 143/442 (32%) , Positives = 229/442 (51%) , Gaps = 17/442 (3%) 

40 Query: 12 GKSAHYLLSKMGRGST-YPGSLALKFDKDILDTIAKDYE--IVVVTGTNGKTLTTALTVG 68 

GK+A +L + G G T +PG + + IL +A+ + +WTGTNGKT T+ + 
Sbjct: 2 GKTAIWLNRRFGHGGTSFPGGIGRRVAPQILTALARQLKRGAMVVTGTNGKTTTSKMLAA 61 

Query: 69 ILKFAFGQVVTNPSGANMITGIVSTFLTAKKSKSG--KKIAVLEIDEASLPRITQYIKPS 126 
45 I++++ + N +GAN++ GI + F+ + + ++E+DEA++P++ + ++P 

Sbjct: 62 IVEKSSLTLTHNRAGANLVGGITTAFIDSATIGGSITSDLGIIEVDEATIPQLVREVQPK 121 

Query: 127 LWFTNIFRDQ^RYGEIYTTYQMILDGAANAP-QATILANGDSPLFNS- -KSVTNPVQF 183 
V TN FRDQ+DR+GE+ T+++ PQ++NDPLSK V + 

50 Sbjct: 122 GvWTNFFRDQLDRFGELDKTVSLVGEALRLLPVQSIAVLNADDPLVASLGKDFPGRVLY 181 

Query: 184 YGFNTDKHEPRLAHYNTEGILCPKCQAILTYRLNTYANLGDYTCPNCDFERPNLDYALTR 243 
+G+ + R + E CC LTY + LG Y C +C FERP +T 
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Sbjct: 182 FGIDDRSYGAREMLQSAETRFCRLCGHPLTYDWFFFGQLGHYRCSHCGFERPEPKIKVTG 241 

Query: 244 LTHLTNTSSGFVIDGQ QYNINVGGLYNI YNALflAVS VAEYFGVEPSQI KDGFDKSR 299 

+ S F ++ Q ++ G YNIYNAIAA++ A + 1+ G R 

Sbjct: 242 IQLKGEEGSAFTVETPRGTWQLELSTPGFYNIYNAIAAIASAIRLDLPEKAIRAGLQGYR 301 

Query: 300 AVFGRQETFTIGN-KKCTLVLIKNPVGASQALDMIKLAPYPFSLSVLLNANYADGIDTSW 358 

FGR E + + ++ L LIKNP G + + + PL V++N N ADG D SW 

Sbjct: 302 TNFGRMERIELEDGRRAFLALIKNPTGCDEVIRTLVQNRGPKRLLVI INDNAADGRDISW 361 

Query: 359 IWDANFETI - -LTMNI PE I FAGGVRHSEIARRIiRVTGYDEKRI K- QADKLQDIMTMIEQQ 415 

+WDA+FE++ + + +F G+R ++A RL TG +1+ +A+ I + +E 
Sbjct: 362 LWDADFESLEPVYPELRSVFTSGLRGEDMAIiRIjNYTGIPAESIRYEANVESAIRSAIiEMT 421 

15 Query: 416 ET - EHAYI LATYTAMLEFRE I L 436 

E E YIL TYTA+LE + L 
Sbjct: 422 EPGETLYILPTYTALLESKAAL 443 

A related DNA sequence was identified in S. pyogenes <SEQ ID 425 3> which encodes the amino acid 
20 sequence <SEQ ID 4254>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 . Identities = 343/446 (76%), Positives = 393/446 (87%) 

Query: 1 MKINTALGVRAGKSftHYLLSKMGRGSTYPGSLALKFDKDILDTIAKDYEIVVVTGTNGKT 60 

MK+ T LG+ AGK+A +L+K+GRGSTYPG LAL DKDIL ++KDY+I VWTGTNGKT 
Sbjct: 1 MKMKTLLGIIAGKAAQSILTKLGRGSTYPGRLALACDKDILKDLSKDYDI VWTGTNGKT 60 

35 

Query: 61 LTTALWGILKEAFGQVVTNPSGANMITGIVSTFLTAKKSKSGKKIAVLEIDEASLPRIT 120 

LTTALTVGILKEAFG+++TNPSGANMITGI STFL AKK KS ++IAVLEIDEASLPRIT 
Sbjct: 61 LTTALTVGILKEAFGEIITNPSGANMITGITSTFLAAKKGKSERQIAVLEIDEASLPRIT 120 

40 Query: 121 QYIKPSLFVFTNIFRDQMDRYGEIYTTYQMILDGAANAPQATILANGDSPLFNSKSVTNP 180 

Y+KPSLFV+TNIFRDQMDRYGEIYTTYQMI+DGA NAP+ATILANGDSP+F+SK + NP 
Sbjct: 121 TYLKPSLFVYTNIFRDQMDRYGEIYTTYQMIVDGARNAPKATILANGDSPIFSSKDIVNP 180 

Query: 181 VQFYGFNTDKHEPRLAHYNTEGILCPKCQAILTYRLNTYANLGDYTCPNCDFERPNLDYA 240 
45 VQ+YGF+T KH P+LAHYNTEGILCPKC+ IL YRLNTYANLGD+ C NC F+RP LDY 

Sbjct: 181 VQYYGFDTAKmPQLAHYNTEGILCPKCEHILQYRLNTYANLGDFVCLNCQFQRPTLDYQ 240 

Query: 241 LTRLTHLTNTSSGFVIDGQQYNINVGGLYNIYNALAAVSVAEYFGVEPSQIKDGFDKSRA 300 
LT LT +T+ SS FVIDGQ Y INVGGLYNIYNALAAVSVAE+FGV P +IK GF+KS+A 
50 Sbjct: 241 LTELTAITHQSSEFVIDGQNYKINVGGLYNIYNALAAVSVAEFFGVSPEKIKAGFNKSKA 300 

Query: 301 VFGRQETFTIGNKKCTLVLIKNPVGASQALDMIKLAPYPFSLSVLLNANYADGIDTSWIW 360 

VFGRQETFT+G+K CTL+LIKNPVGASQAL+MI+LA YPFSLSVLLNANYADGIDTSW1W 
Sbjct: 301 VFGRQETFWGDKSCTLILIKNPVGASQALEMIQ1JU3YPFSLSVLLNANYADGIDTSWIW 360 

55 

Query: 361 DANFETILTMNIPEIFAGGVRHSEIARRLRVTGYDEKRIKQADKLQDIMTMIEQQETEHA 420 

DANFE I M I EI AGGVRHSEIARRLRVTG+D+ +IKQA+KL+ 1+ IE+QE +HA 
Sbjct: 361 DANFELITQMPITEINAGGVRHSEIARRLRVTGFDDTKIKQAEKLEQIIETIEKQEAKHA 420 

60 Query: 421 YILATYTAMLEFREILANHNAIRKEM 446 

YILATYTAMLEFR +LA+ + + KEM 
Sbjct: 421 YILATYTAMLEFRSLLADRHWEKEM 446 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1388 

A DNA sequence (GBSxl473) was identified in S.agalactiae <SEQ ID 4255> which encodes the amino 
acid sequence <SEQ ID 4256>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3010 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC84011 GB:AF080002 cobyric acid synthase CobQ [Heliobacillus 
mobilis] 

Identities = 89/250 (35%) , Positives = 129/250 (51%) , Gaps = 9/250 (3%) 



+K TL + HLY +LLN YGD GNI+ ++ E G + SL ++ + + 

SKTSNRTLTLIHLYPDLLNLYGDRGNI ITLRRRCEWRGITLQVHSASLGEKAAFDDADLV 6 1 



F GGG D EQ ++ +D K G+ +L++CGG+QLLG YY GE + G+ 



G+ ' +T + R IG++ E T GFENH GRTFL +PL V G 



GNN +D EG YKN G+Y HGP+L +N LA L++ AL +YG + ++E 



L + + +G 



A related DNA sequence was identified in S.pyogenes <SEQ ID 425 7> which encodes the amino acid 
sequence <SEQ ID 4258>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 



Query: 


11 


Sb j ct : 


2 


Query: 


71 


Sb j ct : 


62 


Query: 


130 


Sb j ct : 


122 


Query: 


186 


Sb j ct : 


181 


Query: 


243 


Sb j ct : 


241 



Final Results 

bacterial cytoplasm Certainty=0 .2586 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 197/260 (75%) , Positives = 224/260 (85%) 

Query: 1 MTYTSLKSPTTJCDYKYTLNVAHLYGNLLNTYGDNGNILMMKYVGEKLGCQMTFDIVSLED 60 

MTYTSLKSP +DY Y L +AHLYGNL+NTYGDNGNILM+KYV EKLG ++T DIVS+ D 
Sbjct: 1 MTYTSLKSPENQDYIYDLTIAHLYGNLMNTYGDNGNIIjMLKYVAEKLGARvTVDIVSIND 60 

Query: 61 RFDPNYYQMAFFGGGQDYEQAIVARDLPSKKEDINKFIQNNGVVLAICGGFQLLGQYYIQ 120 

F+ + Y + FFGGGQDYEQ+ 1 VA+DLPSKK + +1 NN WLAI CGGFQLLGQYY+Q 
Sbjct: 61 TFEQDDYDIVFFGGGQDYEQSIVAKDLPSKKAALADYIANNKVVLAICGGFQLLGQYYVQ 120 
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Query: 121 MIGERIEGIGVMGHYTLNQNMJRYIGDIKIHNDEFNETYyGFENHQGRTFLSEDEKPLGT 180 

ANG +I+G+G+MGHYTLNQ+ NR+IGDIKIHNDEFNETYYGFENHQGRTFLS DEKPLG 
Sbjct: 121 ANGVKIDGLGIMGHYTLNQHQNRFIGDIKIHNDEFNETYYGFENHQGRTFLSGDEKPLGR 180 

Query: 181 VIYGNGNNKEDGTEGVHYKlWFGSYFHGPILSRNANIAYRLVATALRNKyGKEIVLPSYE 240 

V+YGNGNNKED TEGVHYKNV+GSYFHGPILSRN NLAYRLV TAL+ KYG I LPSY+ 
Sbjct: 181 VWGNGNNKEDQTEGVHYKNVYGSYFHGPILSRNWLAYRLVTTALKKKYGSAISLPSYD 240 

Query: 241 E I LSLE I PEEYGDVKSKADF 260 

+IL EI EEY D+KSKA F 
Sbjct: 241 DILKQEITEEYADLKSKASF 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1389 

A DNA sequence (GBSxl474) was identified in S.agalactiae <SEQ ID 4259> which encodes the amino 
acid sequence <SEQ ID 4260>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

--; — Final Results 

bacterial cytoplasm Certainty=0 . 1701 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04402 GB:AP001509 lipoate-protein ligase [Bacillus halodurans] 
Identities = 153/316 (48%), Positives = 212/316 (66%), Gaps = 3/316 (0%) 



Query: 


10 


DPAYNVAIiEAYAFQKLTDIDE I FIL - WINEPAI I IGRHQNTIQE INKEFIDKNGIHWRR 


68 






DP N+A+E YA + L DI+E ++L +INEP+IIIGR+QNTI+EIN E+++ NGIHWRR 




Sbjct: 


11 


DPRINIiAIEEYALKNL-DINETYLLFYINEPSIIIGRNQNTIEEINTEYVESNGIHWRR 


69 


Query: 


69 


LSGGGAVYHDLNNLNYTIISNNTQEGAFDFQTFSKPVIDTLAKLGVKAEFTGRNDL'-EIN 


127 






LSGGGAVYHD NLN++ 1+ + E +FQ F+ PVI LAKLGV AE GRND+ + 




Sbj ct: 


70 


LSGGGAVYHDHGNI^FSFITKDDGESFSNFQKFTDPVIKALAKLGVTAELKGRNDIIASD 


129 


Query: 


128 


GQKFAGNAQAYYKGR^HHGCLLFDVDMSVlGQALKVSKDKIESKGIKSVRARV'rNIVDH 


187 






G+K +GNAQ KGRM HG LLFD ++ + AL VSKDKIESKGIKS+R+RV NI + 




Sbj ct : 


130 


GRKISGNAQFSTKGRMFSHGTLLFDSEIDHWSALNVSKDKIESKGIKSIRSRVANISEF 


189 


Query: 


188 


LSDKITVQEFSDAILAQMKEEYPEMDEYVLSDAELSEIQAMRDNQFATWDWTYGKAPEYT 


247 






L++KI++ +F +L +'+ + EY L+ + +EI + ++ WDW YGK+P + 




Sbj ct : 


190 


LTEKISIDQFRSLLLESIFDGQANIQEYKLTADDWAEIHELSKERYQNWDWNYGKSPAFN 


249 


Query: 


248 


lERGWYPAGKITTYANVENSTIKSVKIFGDFFGVKPVTlDIEKMLEGWYDYKDVIAALK 


307 






++ R+P G I V+ TI+ KIFGDFFG V D+E L G+RY+ D+ AL 




Sbj ct : 


250 


LQHSHRFPVGNIDIRLEVKGGTIQQCKIFGDFFGTGDVRDLEDRLVGIRYERADIEQALA 


309 


Query: 


308 


TVDTSQYFSRMTPEEI 323 








VD YF ++ ++I 




Sbj ct: 


310 


DVDVKTYFGQVEKDDI 325 





A related DNA sequence was identified in S.pyogenes <SEQ ID 426 1> which encodes the amino acid 
sequence <SEQ ID 4262>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1271 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

5 Identities = 249/328 (75%) , Positives = 292/328 (88%) 

MKYIVNTSNDPAYNVALEAYAFQKLTDIDEIFILWINEPAIIIGRHQNTIQEINKEFIDK 60 
MKYIVN S++PA+N+ALEAYAF++L + DE+FILWINEPAIIIG+HQNTIQEINKE+ID+ 
MKYIWKSHNPAFNIALFAYAFRELVEEDELFILWINEPAIIIGKHQNTIQEINKEYIDE 60 

NGIHWRRLSGGGAVYHDLNNLNYTIISNNTQEGAFDFQTFSKPVIDTIAK1GVKAEFTG 120 
+GIHVTORLSGGGAVYHDLNNLNYTIISN T EGAFDF+TFS+PVI TLA LGV A FTG 
HGIHVTORLSGGGAVYHDLNNLNYTIISNKTAEGAFDFKTFSQPVIATLADLGVTANFTG 12 0 

15 Query: 121 RNDLEINGQKFAGNAQAYYKGRMMHHGCLLFDVDMSVLGQALKVSKDKIESKGIKSVRAR 180 

RND+EI+G+K GNAQAYYKGRMMHHGCLLFDVDM+VLG ALKVSKDKIESKG+KSVRAR 
, Sbjct: 121 RNDIEIDGI<KICGNAQAYYKGRMMHHGCLLFDTOMTVLGDALKVSKDKIESKGVKSVRAR 180 



Query : 1 

Sbjct: 1 

10 

Query: 61 

Sbjct: 61 



Query: 181 WNIVDHLSDKITVQEFSDAILAQMKEEYPEMDEYVLSDAELSEIQAMRDNQFATWDWTY 240 
20 VTNI++ L +KITV+EFSD ILA+MKE YP+M EYVLS+ EL++I+ QF +WDWTY 

Sbjct: 181 VTNILNELPEKITVEEFSDKIIAKMKETYPDMTEYVLSEDEIAKIEQSAKEQFGSWDWTY 240 

Query: 241 GKAPEYTIERGVRYPAGKITTYANVENSTIKSVKIFGDFFGVKPVDDIEKMLEGVRYDYK 300 
GKAPEYTIER VRYPAGKI +T+ANVENS IK++KI+GDFFG+K V DIE +L G +Y+Y+ 
25 Sbjct: 241 GKAPEYTIERNVRYPAGKISTFANVENSIIKNLKIYGDFFGIKDVQDIENLLIGCKYEYR 300 

Query: 301 DVLAALKTVDTSQYFSRMTPEEITKAIV 328 

DV LKT+DT+QYFSRMT EE+ KAIV 
Sbjct: 301 DVFERLKTIDTTQYFSRMTVEEVAKAIV 328 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1390 

A DNA sequence (GBSxl475) was identified in S.agalactiae <SEQ ID 4263> which encodes the amino 
35 acid sequence <SEQ ID 4264>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 294 - 310 ( 294 - 312) 

40 Final Results 

bacterial membrane Certainty=0. 1680 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA21748 GB:L31844 dihydrolipoamide dehydrogenase [Clostridium 
magnum] 

Identities = 229/589 (38%) , Positives = 339/589 (56%) , Gaps = 25/589 (4%) 

50 Query: 1 MAFDVIMPKLGVDMQEGEILEWKKIffiGDTvNEGDvLLEIMSDKTNMEIEAEDTGVLLKIV 60 

MA V+MPKLG+ M EG ++ WKK EGD V G++L E+ +DK E+E+ D G++ K++ 
Sbjct: 1 MAKIVVMPKLGLTMTEGTLVTWKKAEGDQVKVGEILFEVSTDKLTNEVESSDEGIWKLL 60 

Query: 61 HQAGDWPVTEVIAYIGEEGEEVGTSSPSADATITAEDGQSVSGPAAPSQETVAAATPKE 120 
55 GDW +A IG E++ + +G S +A +T A PK+ 

Sbjct: 61 VNEGDWECLNPVAI IGSADEDISSLL NGSSEGSGSAEQSDTKA PKK 107 

Query: 121 ELAADEY- -DIVWGGGPAGYYAAIRGAQLGGKIAIVEKTEFGGTCLNVGCIPTKTYLKN 178 
E+ A + ++W+GGGP GY AAIR AQLG K+ ++EK GGTCLNVGCIPTK L + 
60 Sbjct: 108 EVEAWGGDNLVVIGGGPGGYVAAIRAAQLGAKVTLIEKESLGGTCLNVGCIPTKVLLHS 167 
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Query: 179 AEILDGLKVAAGRGINIASTOTAIDMDKTV^ 238 

+++ L +K GI++ + ++ K V+K L GV GLL NKV++ G 

Sbjct: 168 SQLLTEMKEGDKLGIDIEGS-IVVNWKHIQKRKKIVIKKIjVSGVSGLLTCNICVKVIKGTA 226 

5 Query: 239 QVNPDKSWIGDK VIKGRNWLATGSKVSRINIPGIESPLVLTSDDILDLREIPK 293 

+ ++++ + + N ++ATGS I G + V+ S L L P+ 

Sbjct: 227 KFESKDTILVTKEDGVAEKVNFDNAIIATGSMPFIPEIEGNKLSGVIDSTGALSLESNPE 286 

Query: 294 SLAVMGGGWG I ELGLVWASYGVDVTVIEMADRI I PAMDKEVSLELQKILAKKGMKI KTS 353 
10 S+A++GGGV+G+E ++ S G V++IEM I+P MD+E+S + L + G+ I + 

Sbjct: 287 SIAIIGGGVIGVEFASIFNSLGCKVSIIEMLPHILPPMDREISEIAKAKLIRDGININNN 346 

Query: 354 VGVSEIVEANNQLTLKL- -NNGEEW-ADKALLSIGRVPQMNGLENLEPELEMERGRIKV 410 
V+ I + + h + + GEE + +K L+++GR + GL+ + ++ E G I V 
15 Sbjct: 347 CKVTRIEQGEDGLKVSFIGDKGEESIDVEKVLIAVGRRSNIEGLDVEKIGVKTEGGSIIV 406 

Query: 411 NAYQETSIPGIYAPGDVNGTRMLAHAAYRMGEVAAENALGGNKRKAHLDFTPAAVYTHPE 470 

N ET++ GIYA GD G MLAH A G VAAEN +G NK K PA VYT PE 

Sbjct: 407 NDKMETNVEGIYAIGDCTGKIMIAHVASDQGWAAENIMGQNK-K^YKTVPACVYTKPE 465 

20 

. Query: 471 VAMVGMTEEQAREQYGDILVGICNSFTGNGRAIASNEAHGFVKVIAEPKYKEILGVHIIGP 530 
+A VG+TEEQA+E+ D VGK NG+++ NE G +K+I + KY+EILGVHI+GP 

Sbjct: 466 LASVGLTEEQAKEKGIDYKVGKFQLAANGKSLIMNETGGVIKIITDKKYEEILGVHILGP 525 

25 Query: 531 AAAEL INEAST IMENELTVYDVAQS IHGHPTFSEVMYEAFLDVLGEAI H 579 

A +LI EA+ + E T+ ++ ++H HPT E M EA L V +AIH 
Sbjct: 526 RATDLITEAALALRLEATLEEIITTVHAHPTOGEAMKEAALAVNNQAIH 574 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1819> which encodes the amino acid 
30 sequence <SEQ ID 1820>. Analysis of this protein sequence reveals the following: 
Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 297 - 313 ( 297 - 315) 

35 ' Final Results 

bacterial membrane Certainty=0 . 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 497/591 (84%) , Positives = 538/591 (90%) , Gaps = 10/591 (1%) 

Query: 1 MAFDVIMPKLG VDMQEGE I LEWKKNEGDTVNEGDVLLE I MSDKTNME I EAEDTGVLLKI V 60 
MA ++IMPKLGVDMQEGEI+EWKK EGDTVNEGD+LLEIMSDKTNME+EAED+GVLLKI . 
45 Sbjct: 1 MAVEIIMPKLGVDMQEGEIIEWKKQEGDTVNEGDILLEIMSDKTNMELEAEDSGVLLKIT 60 

Query: 61 HQAGDWPVTEVIAYIGEEGEEVGTSSPSA DATITAEDGQS - - VSGPAAPSQETVAA 115 

QAG+ VPVTEVI YIG EGE V SSP+A + T ED ++ + P AP+Q A+ 
Sbjct: 61 RQAGETVPVTEVIGYIGAEGESVEVSSPAASDVNVARTTEDLEAAGLEVPKAPAQ--AAS 118 

50 

Query: 116 ATPKEELAADEYDIVWGGGPAGYYAAIRGAQLGGKIAIVEKTEFGGTCLNVGCIPTKTY 175 

A PK LA DEYDI+WGGGPAGYYAAIRGAQLGGKIAIVEK+EFGGTCLNVGCIPTKTY 
Sbjct: 119 AAPKAALADDEYDIIWGGGPAGYYAAIRGAQLGGKIAIVEKSEFGGTCLNVGCIPTKTY 178 

55 Query: 176 LKNAEILDGLKVAAGRGINIASTNYAIDMDKTVAFKNS\nnCTLTGGWGLLKANKVEIFN 235 

LKNAEILDG+K+AAGRGINLASTNY IDMDKTV FKN+WKTLTGGV+GLLKANKV I FN - 
Sbjct: 179 LKNAEILDGIKIAAGRGINLASTNYTIDMDKTVDFKNTWKTLTGGVQGLLKANKVTIFN 238 

Query: 236 GLGQVNPDKSWIGDKVIKGRNWLATGSKVSRINIPGIESPLVLTSDDILDLREIPKSL 295 
60 GLGQVNPDK+V IG + IKGRNV+LATGSKVSRINIPGI+S LVLTSDDILDLRE+PKSL 

Sbjct: 239 GLGQWPDKTVTIGSQTIKGRNVILATGSKVSRINIPGIDSKLVLTSDDILDLREMPKSL 298 

Query: 296 AWGGGWGIELGLWASYGVDVTVIEMADRIIPAMDKEVSLELQKILAKKGMKIKTSVG 355 
AVMGGGWGIELGLWASYGVDVTVIEMADRIIPAMDKEVSLELQKIL+KKGMKIKTSVG 
65 Sbjct: 299 AVMGGGWGIELGLVWASYGVDVTVIEMADRIIPAMDKEVSLELQKILSKKGMKIKTSVG 358 
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Query: 356 VSEIVEAM^QLTLKI^GEEWADKALLSIGRVPQMNGLENLEPELEMERGRIKVNAYQE 415 

VSEIVEANNQLTLKLNNGEEWA+KALLSIGRV QMNGLENL LEM+R RIKVN YQE 
Sbjct: 359 VSEIVEANNQLTLKLNNGEEVVAEKALLSIGRVSQMNGLENL- -NLEMDRNRIKVNDYQE 416 

5 

Query: 416 TS I PGIYAPGDTOGTRMLAHAAYRMGEVAAENALGGN- KRKAHLDFTPAA VYTHPE VAMV 474 

TSIPGIYAPGDVNGT+MIAHAAYRMGEVAAENA+ GN RKA+L +TPAAVYTHPEVAMV 
Sbjct: 417 TS I PGI YAPGDVNGTKMLAHAAYRMGEVAAENAMHGOTTRKANLKYTPAAVYTHPEVAMV 476 

10 Query: 475 GMTEEQAREQYGD I LVGKNSFTGNGRAIASNEAHGFVKVIAEPKYKE I LGVHI IGPAAAE 534 

G+TEEQAREQYGD+L+GKNSFTGNGRAIASNEAHGFVKVIA+ KY EILGVHI IGPAAAE 
Sbjct: 477 GLTEEQAREQYGDVLIGKNSFTGNGRAIASNEAHGFVKVIADAKYHEILGVHIIGPAAAE 536 

Query: 535 LINEASTIMENELTVYDVAQSIHGHPTFSEVMYEAFLDVLGEAIHNPPKRK 585 
15 + INEA+TIME+ELTV ++ SIHGHPTFSEVMYEAF DVLGEAI HNPPKRK 

Sbjct: 537 MINEAATIMESELTVDELLLSIHGHPTFSEVMYEAFADVLGEAI HNPPKRK 587 

SEQ ID 4264 (GBS681) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 165 (lane 2; MW 68.3kDa) and in Figure 188 (lane 10; MW 68kDa). 

20 Purified GBS681-His is shown in Figure 240, lane 5-6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1391 

A DNA sequence (GBSxl476) was identified in S.agalactiae <SEQ ID 4265> which encodes the amino 
25 acid sequence <SEQ ID 4266>. This protein is predicted to be dihydrolipoamide acetyltransferase. Analysis 
of this protein sequence reveals the following: 

, Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 4466 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04497 GB:AP001509 dihydrolipoamide S-acetyltransf erase 
[Bacillus halodurans] 
Identities = 187/462 (40%) , Positives = 266/462 (57%) , Gaps = 26/462 (5%) 

40 Query: 1 mvEIIMPKLGvDMQEGEILEWKKQVGDvVNEGDVLLEIMSDKTNMEIEAEDSGVLLKIT 60 

MA EI MPKL MQEG +L+W K+ GD V G+ L EIM+DK N+E+EA + G LLK 
Sbjct: 1 MAKEIFMPKLSSTMQEGTLLQWFKEEGDRVEVGEPLFEIMTDKINIEVEAYEEGTLLKRY 60 

Query: 61 HGNGDWPVTETIGYIGAEGEEVTEASSSENTSVEENATQvTSEPEKVEETSEPSVPAAT 120 
45 +G D +PV IGYIG E V +E E T E T+ , P++ 

Sbjct: 61 YGEDDEIPVNHVIGYIGTPDESVP TEPPGASEITASSTDEAGDHRTTAVKKAPSSD 116 

Query: 121 SGEKvRATPAARKIAREMSIDIiALVSGTGANGRVHREDVENFKGAQPRITPLARRIAEDQ 180 
E VRATPAAR++A+E IDL V G+G GRV DV FK + TPLA+++AE + 
50 Sbjct: 117 R-ENVRATPAARRIAKEKRIDLRQVEGSGPEGRVQAvnVATFKKKGQKATPIAKKVAEVK 175 

Query: 181 GVniAEITGSGIRGKIVKNDvTiAAMSPQAAEAPvETKATPTTEEKQLPEGVEVIKMSAMR 240 

GV + ++ GSG GK+ + DV A A +PVE K +K+S +R 

Sbjct: 176 GVALEKVQGSGPYGKVYREDVEHAQ AASPVEDKGNR VKLSGLR 218 



55 



Query: 241 KAISKGMTNSYLTAPSFTLNYDIDMTEMMAIiRKKLIDPIMAKTGLKVSFTDLIGMAVVKT 300 

K ++K M +S +AP T+ +IDM+ + +R +L+ I +TG ++S+T+++ AV 
Sbjct: 219 KWAKR^WDSAFSAPHVTITTEIDMSSTIKIRSQI^LGMIEQETGYRLSYTEIVMKAVAHA 278 
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Query: 301 LMKPEHRYLNASLI1TOAQEIELHNFVNIGIAVGLDDGLIVPVVHNADQMSLSDFVIASKD 360 

LM H +NAS + EI H V+IG+AV ++ GL+VPW + D+ L+ K 
Sbjct: 279 LMS--HPTINASFFEN--EIVYHEDVHIGmVAVEGGLVVPVVKHVDKKGLAQLTNECKT 334 

Query: 361 VIKKTQEGKLKSAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGATIPTPTVVDGE 420 

V ++ +L MSG TF+I+NLGM+ F P+INQP SAILGVG P +DG+ 

Sbjct: 335 VAMAARDNRLSQEMMSGGTFTISNLGMYAIDVFTPVINQPESAILGVGRIQEKPVGIDGQ 394 

Query: 421 IVARPIMAMCLTIDHRIVDGMNGAKFMVDLKNLMENPFGLLI 462 

I RP+M L+ DHR++DG A F+ D+K+++E PF LL+ 
Sbjct: 395 IELRPMMTASLSFDHRVIDGAPAAAFLTDVKSMLEQPFQLLM 436 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4267> which encodes the amino acid 
sequence <SEQ ID 4268>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4774 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 354/473 (74%), Positives = 390/473 (81%), Gaps = 15/473 (3%) 



Query: 


1 


MAVEIIMPKLGVDMQEGEILEWKKQVGDvvNEGDVLLEIMSDKTNMEIEAEDSGVLLKI 


60 






MA EIIMPKLGVDMQEGEI+EWKKQ GD VNEGD+LLEIMSDKTNME+EAEDSGVLLKIT 




Sb j ct : 


1 


MAFEIIMPKLGvDMQEGEIIEWKKQEGDTvNEGDILLEIMSDKTNMELEAEDSGVLLKIT 


60 


Query: 


61 


HGNGDWPVTETIGYIGAEGEEVTEASSSENTS VEENATQVTSEPEKVEETSEPS 


115 






GD VPVTE IGYIGAEGE V +SSE T+ +A + E V + P 




Sb j ct : 


61 


RQAGDTVPVTEVIGYIGAEGESVDT IAS SEKTTE I PVPASADAGPAVAPKENVASPA- PQ 


119 


Query: 


116 


VPAAT SGEKVRATPAARKLAREMS IDLALVSGTGANGRVHREDVENFKGAQPRITP 


171 






V A +G KVRATPAARK A EM IDL V GTG GRVH+EDVENFKGAQP+ +P 




Sb j ct : 


120 


VAATAIPQGNGGKVRATPAARKAAAEMGIDLGQVPGTGPKGRVHKEDVENFKGAQPKASP 


179 


Query: 


172 


LARRIAEDQGVDIAEITGSGIRGKIVKNDVLAAMSPQAAEAPVETKATPTTEEK--QLPE 


229 






LAR+IA D+G+D+A ++G+G GK++K D++A + A P E KA EEK LPE 




Sb j ct : 


180 


LARKIAADKGIDLATVSGTGFNGKVMKEDIMAILE- - -AAKPAEAKAPAAKEEKVVDLPE 


236 


Query : 


230 


GVEVIKMSAMRKAISKGMTNSYLTAPSFTIjNYDIDMTEMMALRKKLIDPIMAKTGLKVSF 


289 




GVE MSAMRKAISKGMTNSYLTAP-fFTLNYDIDMTEM+ALRKKLIDPIMAKTGLKVSF 




Sb j ct : 


237 


GVEHKPMSAMRKAISKGMTNSYLTAPTFTLNYDIDMTEMIALRKKLIDPIMAKTGLKVSF 


296 


Query: 


290 


TDLIGMAVWTLMKPEHRYLNASLINDAQEIELHNFVNIGIAVGLDDGLIVPVVHNADQM 


349 






TDLIGMAWKTLMKPEH Y+NASLINDA +IELH FVN+GIAVGLDDGLI VPV+H A++M 




Sb j ct : 


297 


TDLIGMAvVKTLMKPEHEYMNASLINDANDIEUHRFVNLGIAVGLDDGLIVPVIHGANKM 


356 


Query: 


350 


SLSDFVIASKDVIKKTQEGKLKSAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA 


409 






LSDFV+ASKDVIKK Q GKLK+AEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA 




Sb j ct : 


357 


CLSDFVLASKDVIKKAQTGKLKAAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA 


416 


Query: 


410 


TIPTPTVVDGEIVARPIMAMCLTIDHRIVDGMNGAKFMVDLKNLMENPFGLLI 462 








TIPTPTWDGEIV+RPIMAMCLTIDHR+VDGMNGAKFMVDLK LMENPF LLI 




Sb j ct : 


417 


TIPTPTVVDGEIVSRPIMAMCLTIDHRLVDGMNGAKFMVDLKKLMENPFELLI 469 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1392 

A DNA sequence (GBSxl477) was identified in S.agalactiae <SEQ ID 4269> which encodes the amino 
acid sequence <SEQ ID 4270>. This protein is predicted to be acetoin dehydrogenase (TPP-dependent) 
beta chain (pdhB). Analysis of this protein sequence reveals the following: 

5 Possible site: 18 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1267 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9779> which encodes amino acid sequence <SEQ ID 9780> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04496 GB:AP001509 acetoin dehydrogenase (TPP-dependent) beta 

chain [Bacillus halodurans] 
, Identities = 189/319 (59%) , Positives = 249/319 (77%) , Gaps = 1/319 (0%) 

20 Query: 11 EAINVAMSEEMRKDEKVFLMGEDVGVYGGDFGTSVGMLEEFGAKRVRDTPISEAAIAGSA 70 

EAI AM+. EMRK+E VF++GED+GVYGG FG + GM+EEFG++RVR+TPISEAAI+G+A 
Sbjct: 8 EA1REAMTLEMRKNEDVFILGEDIGVYGGAFGVTRGMIEEFGSERVRNTPISEAAISGTA 67 

Query: 71 IGAAQTGLRPIVDLTFMDFVTIAMJAITOQGAKTNYMFGGGLSTPVTFRVASGSGIGSAA 130 
25 IGAA TG+RPI++L F DF+T1AMD +VNQ AK YM+GG P+ R +GSG G+AA 

Sbjct: 68 1GAALTGMRPILELQFSDFITIAMDNMVNQAAKLRY1WGGKAKVPMVLRTPAGSGTGAAA 127 

Query: 131 QHSQSLEAWLTHIPGLKWAPGTVNESKALLKSSII^NNFVIFLEPKALYGKKEEVNMDP 190 
QHSQSLEAW+THIPGLKW P T ++K LLK++I DNNPVIF E K Y K V + 
30 ' Sbjct: 128 QHSQSLEAWMTHIPGLKWQPATAYDAKGLLKAAIDDNNPVIFYEHKLCYRTKCHV-PEE 186 

Query: 191 DFYIPLGKGDIKREGTDLTIVSYGRMLERVMQAAEEVAEEGINVEVVDPRTLIPLDKELI 250 

++ IPLGK D+KR+GTD+T+V+ M+ + ++AA E+ +EGI+VEV+DPRTL+PLD+E I 
Sbjct: 187 EYS I PLGKADVKRKGTDVTWATAVMVHKALEAAVELEKEGI SVEVIDPRTLVPLDEETI 246 

35 

Query: 251 IDSWKTGKLILVNDAYKTGGFTGEIATMVAESEAFDYLDHPIVRLASEDVPVPYSRVLE 310 

I SVKKT +LI+V++A K GGF GEIA+++AESEAFDYLD PI RL + VP+PY+ LE 
Sbjct: 247 IRSVKKTSRLIVVHEAVKRGGFGGEIASIIAESEAFDYLDAPIKRLGGKPVPIPYNPTLE 306 

40 Query: 311 QGILPDVAKIKDAIYKWN 329 

+ +P V I +A+ + +N 
Sbjct: 307 RAAI PQVPDI IEAVKETLN 325 

A related DNA sequence was identified in S. pyogenes <SEQ ID 427 1> which encodes the amino acid 
45 sequence <SEQ ID 4272>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 81 - 97 ( 81 - 97) 

50 . Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

>GP:BAB04496 GB:AP001509 acetoin dehydrogenase (TPP-dependent) beta 
chain [Bacillus halodurans] 
Identities = 187/319 (58%) , Positives = 244/319 (75%) , Gaps = 1/319 (0%) 
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Query: 11 EAVNLAMTEEMRKDENIFLMGEDVGVYGGDFGTSVGMIEEFGPKRVKDTPISEAAISGAA 70 

EA+ AMT EMRK+E++F++GED+GVYGG FG + GMIEEFG +RV++TPISEAAISG A 
Sbjct: 8 FAIREAMTLEMRKNEDVFILGEDIGWGGAFGVTRGMIEEFGSERVRNTPISEAAISGTA 67 

Query: 71 IGAAITGLRPIVDOTFMDFLTI^1MDAIVNNGAKNNYMFGGGLITPVTFRVASGSGIGSAA. 130 

IGAA+TG+RPI+++ F DF+TI MD +VM AK YM+GG P+ R +GSG G+AA 
Sbjct: 68 IGAALTGMRPILELQFSDFITIAMDNMVNQAAKLRYMYGGKAKVPMVLRTPAGSGTGAAA 127 

Query: 131 QHSQSLEAWLTHIPGIKWAPGNANDAKGLLKSAIRDNNIVLFMEPKALYGKKEEVNQDP 190 

QHSQSLEAW+THIPG+KW P A DAKGLLK+AI DNN V+F E K Y K V ++ 
Sbjct: 128 QHSQSLEAWMTHI PGLKWQPATAYDAKGLLKAAIDDNNPVI FYEHKLCYRTKCHVPEE - 186 

Query: 191 DFYIPLGKGDIKREGTDLTIVSYGRMLERVI^AAEEVARDGINVEVVDPRTLIPLDKELI 250 

++ IPLGK D+KR+GTD+T+V+ M+ + L+AA E+ +GI+VEV+DPRTL+PLD+E I 
Sbjct: 187 EYSIPLGKADVKRKGTDVTWATAVMVHKALEAAVELEKEG1SVEVIDPRTLVPLDEETI 246 

Query: 251 IESVKKTGKLMLVNDAYKTGGFIGEIATMITESEAFDYLDHPIVRLASEDVPVPYARVLE 310 

I SVKKT +L++V++A K GGF GEIA+ + I ESEAFDYLD PI RL + VP+PY LE 
Sbjct: 247 IRSVKKTSRLIWHEAVKRGGFGGEIASIIAESEAFDYLDAPIKRLGGKPVPIPYNPTLE 306 

Query: 311 QAILPDVEKIKAAIVKMAN 329 

+A +P V I A+ + N 
Sbjct: 307 RAM PQVPD I IEAVKETLN 325 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 286/331 (86%) , Positives = 310/331 (93%) 

Query: 1 MSETKOTIALREAINVAMSEE^KTJEKVFLMGEDVGVYGGDFGTSVGMLEEFGAKRVRDTP 60 

MSETK+MALREA+N+AM+EEMRKDE +FLMGEDVGVYGGDFGTSVGM+EEFG KRV+DTP 
Sbjct: 1 MSETKLMALREAWLAMTEEMRKDENIFLMGEDVGVYGGDFGTSVGMIEEFGPKRVKDTP 60 

Query: 61 ISEAAIAGSAIGAAQTGLRPIVDLTFmFVTIAMDAIWQGSiKTl^FGGGLSTPVTFRV 120 

ISEAAI+G+AIGAA TGLRPI VD+TFMDF+TI MDAIVN GAK NYMFGGGL TPVTFRV 
Sbjct: 61 ISFAAISGAAIGAAITGLRPIVDVTFMDFLTIIWI^IVIMGAKNNYMFGGGLITPOTFRV 120 

Query: 121 ASGSGIGSAAQHSQSLFAWLTHIPGLKWAPGTVNESKALLKSSILDNNPVIFLEPKALY 180 

ASGSGIGSAAQHSQSLEAWLTHIPG+KWAPG N++K LLKS+I DNN V+F+EPKALY 
Sbjct: 121 ASGSGIGSAAQHSQSLEAWLTHIPGIKWAPGNANDAKGLLKSAIRDNNIVLFMEPKALY 180 

Query: 181 GKKEEVNMDPDFYIPLGKGDIKREGTDLTIVSYGRMLERVMQAAEEVAEEGINVEVVDPR 240 

GKKEEVN DPDFYIPLGKGDIKREGTDLTIVSYGRMLERV+QAAEEVA +GINVEWDPR 
Sbjct: 181 GKKEEVNQDPDFYIPLGKGDIKREGTDLTIVSYGRMLERVLQAAEEVAADGINVEWDPR 240 

Query: 241 TLIPLDKELIIDSVKKTGKLILVNDAYKTGGFTGEIATMVAESEAFDYLDHPIVRLASED 300 

TLIPLDKELII+SVKKTGKL+LVNDAYKTGGF GEIATM+ ESEAFDYLDHPIVRLASED 
Sbjct: 241 TLIPLDKELIIESVKKTGKLMLVNDAYKTGGFIGEIATMITESEAFDYLDHPIVRLASED 300 

Query: 301 VPVPYSRVLEQGILPDVAKIKDAIYKWNKG 331 

VPVPY+RVLEQ ILPDV KIK AI K+ NKG 
Sbjct: 301 VPVPYARVLEQAILPDVEKIKAAIVKMANKG 331 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1393 

A DNA sequence (GBSxl478) was identified in S.agalactiae <SEQ ID 4273> which encodes the amino 
acid sequence <SEQ ID 4274>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 161 - 177 ( 161 - 178) 

Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9777> which encodes amino acid sequence <SEQ ID 9778> 
5 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04495 GB:AP001509 acetoin dehydrogenase (TPP-dependent) alpha 
chain [Bacillus halodurans] 
Identities = 148/317 (46%) , Positives = 214/317 (66%) , Gaps = 1/317 (0%) 

10 





Query: 


8 


LSKEQHLDMFLKMQRIRDVDMKFNKLVRRGFVQGMTHFSVGEEAASVGAIQDLTDSDI I F 


67 








+++++ +D+F +M IR + K ++ +G + G TH +VG+EA++VG+I L + D + 






Sb j ct : 


10 


MTEKKLVDLFKQMWLIRYFEEKVDEFFAKGMIHGTTHLA.VGQEASAVGSIAVLEERDKLT 


69 


15 


Query: 


68 


SNHRGHGQTIAKGIDIGGMFAELAGKATGTSKGRGGSMHLANLEKGNYGTNGIVGGGYAL 


127 








S HRGHG IAKG D+ M AEL G+ TG KG+GGSMH+A++E+GN G NGIVGGG+++ 






Sb j ct : 


70 


STHRGHGHCIAKGADVNRMMAELFGRETGYCKGKGGSMHIADVERGNLGANGIVGGGFSI 


129 


20 


Query: 


128 


AVGAALTQQYEGTDNIVIAFSGDSATNEGSFHESVNIAAVWNLPVIFFIINNRYGISTDI 


187 






A GAALT + + +V+ F GD A+NEGS FHE+ VNLA++W LPV+F NN+YG+S + 






Sb j ct : 


130 


ATGAALTSKMKKEGYWLCFFGDGASNEGSFHEAVNLASIWKLPWFICENNQYGMSGSV 


189 




Query: 


188 


TYSTKIPHLYMRADAYGIPGHYVEDGNDLMAVYEKMHEVINYVRSGNGPAIVEVESYRWF 


247 


25 






I H+ RA YGIPG V DGND+ AV + ++ R G GP IVE ++YRW 




Sb j ct : 


190 


KEMINI EHI SDRAAGYGI PG - MWDGNDVFAVMNWGRAVDRARRGEGPT I VEAKTYRWK 


248 



Query: 248 GHSTADAGVYRTKEEVDSWKAKDPVKRYRAYLIENEIATEEEIAAIEAQVIKEVEEGVKF 
GHS +DA YRT+EE W+ KDP+ R RA L++ I TEEE +1+ + +++E+ V+F 
Sbjct: 249 GHSKSnAKKYRTREEEKEWREKDPIARLRATLWEGIVTEEEADSIQEEAKQKIEDSVQF 

30 

Query: 308 AEESPFPDMSVAFEDVF 324 

A SP P++ EDV+ 
Sbjct: 309 ARNSPEPEIESLLEDVY 325 

35 A related DNA sequence was identified in S. pyogenes <SEQ ID 4275> which encodes the amino acid 
sequence <SEQ ID 4276>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 3 5 02 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/326 (74%) , Positives = 278/326 (84%) 



307 
308 



Query: 1 METOMVTLSKEQHLDMFLKMQRIRDVDMKFNKLTORGWCjGMTHFSVGEEAASVGAIQDL 60 
ME MVT+SKEQHLDMFLKM+RIR+ D + NKLVRRGFVQGMTHFSVGEEAA+VGA+ L 
50 Sbjct: 1 MEAEMVTVSKEQHLDMFIjKMERIREFDSRINKIjVKRGFVQGMTHFSVGEEAANVGAVAHL 60 



55 



Query: 61 TDSDIIFSNHRGHGQTIAKGIDIGGMFAELAGKATGTSKGRGGSMHLANLEKGNYGTNGI 120 

+ D I I FSNHRGHGQ+ IAK +D+ M AELAGKATG SKGRGGSMHLA+ EKGNYGTNGI 
Sbjct: 61 SYDDIIFSNHRGHGQSIAKDMDENKMMAEIAGKATGVSKGRGGSMHLADFEKGNYGTNGI 120 

Query: 121 VGGGYAIAVGAALTQQYEGTDNIVIAFSGDSATlTOGSFHESVlsnjAAvWNLPVIFFIINNR 180 

VGGGYALAVGAALTQQY+GT+NI +AFSGD ATNEGSFHESVN+AA W LPVIFFI INNR 
Sbjct: 121 VGGGYAIAVGAALTQQYKGTNNIAVAFSGIX^TNEXSSFHESVNMAATWKLPVIFFIINNR 180 

60 Query: 181 YGISTDITYSTKIPHLYMRADAYGIPGHYVEDGNDLMAVYEKMHEVINYVRSGNGPAIVE 240 

YGIS I +T PHLY RA+AYG+PG Y EDGND+MAVYE M + + +VR GNGPAIVE 
Sbjct: 181 YGISMSINNAIOTPHLYTRAEAYGVPGFYCEIXSNDvmVYETMGKAVEHvRGGNGPAIVE 240 
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Query: 241 VESYRWFGHSTADAGVYRTKEBVDSWKAKDPVKRYRAYLIENEIATEEELAAIEAQVIKE 300 

VESYRWFGHSTADAG YRTKEEVD WK KDP+ +YR YL IAT++EL AI+AQV KE 
Sbjct: 241 VESYRWFGHSTADAGKYRTKEEVDEWKEKDPMIKYRTYLTSEGIATDDELDAIQAQVKKE 3 00 

Query: 301 VEEGVKFAEESPFPDMSVAFEDVFVD 326 

V++ +FA+ SP P++SVAFEDV+VD 
Sbjct: 301 vDDAYEFAQNSPDPELSVAFEDVWVD 326 

A related GBS gene <SEQ ID 8797> and protein <SEQ ID 8798> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: , -14.75 
GvH: Signal Score (-7.5): -4.24 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -3.03 threshold: 0.0 

INTEGRAL Likelihood = -3.03 Transmembrane 161 - 177 ( 161 - 178) 
PERIPHERAL Likelihood = 3.55 117 
modified ALOM score: 1.11 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — - Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF0179K298 - 1278 of 1578) 

EGAD 1 108208 |BS0806 (3 - 327 of 333) acetoin dehydrogenase El component {Bacillus subtilis} 
OMNI | NT01BS0951 acetoin:DCPIP oxidoreductase alpha subunit 

GP| 2780395 | dbj |BAA24296.l| |D78509 YfjK {Bacillus subtilis} 

GP| 2633130 |emb[CAB12635.l| |Z99108 acetoin dehydrogenase El component (TPP- dependent alpha 
subunit) {Bacillus subtilis} GP 1 2957146 |gb]AAC05582 . 1 1 |AF006075 TPP-dependent acetoin 
dehydrogenase, El alpha-subunit {Bacillus subtilis} PIR| D69581 |D69581 acetoin dehydrogenase 
El component (TPP-dependent alpha subuni) acoA - Ba 
%Match =26.3 

%Identity =45.3 %Similarity =65.7 

Matches = 148 Mismatches = 109 Conservative Sub.s = 67 

231 261 291 321 351 381 411 441 

F*IEMPFTKTKKAVQILASCEKNLYNN*VIKIFLEVRMVTLSKEQHIjDMFLKMQRIRDvI3MKFNKLVRRGFVQGMTHFSV 

:|: ::|::|: | |: II II : I ::| :| : | |: 
MKLLKREGLSLTEEKALWMYQKMLEIRGFEDKVHELFAQGVLPGFVHLYA 
10 20 30 40 50 

471 501 531 561 591 621 651 681 

GEEAASVGAIQDLTDSDIIFSNHRGHGQTIAKGIDIGGMFAELAGKATGTSKGRGGSMHLANLEKGNYGTNGIVGGGYAL 

mi =n i i i ii mil: mi i= ii n= inn 11 = 11111 = 1 = 1 = 11 1 1111111= 1 

' GEEAVAVGVCAHLHDGDSITSTHRGHGHCIAKGCDLDGMMAEIFGKATGLCKGKGGSMHIADLDKGMLGANGIVGGGFTL 
60 70 80 90 100 110 120 130 



711 741 771 801 831 861 891 921 

AVGAALTQQYEGTDNIVIAFSGDSATNEGSFHESVNLAAVWNLPVIFFIINNRYGISTDITYSTKIPHLYMRADAYGIPG 

I 1 = 111 =1= I 1= = I II I |:| = lll milllllim II II =1 l = = = II II =11 
ACGSALTAKYKQTK3WSVCFFGDGANNQGTFHEGLNIAAVWNLPW 

140 150 160 170 180 190 200 210 

951 981 1011 1041 1071 1098 1128 1158 

HYVEDGITOLMAvYEKMHEVINYVRSGNGPAIvEVESYRWFGHSTAD^ 

II l==lll= I I 1=1 ll===l =11 =11 II 1=11=1 1= = II == == ll== I 
-VTVDGKDILAVYQAAEFAIERARNGGGPSLIECMTYRNY 

220 230 240 250 260 270 280 
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1188 1218 1248 1278 1308 1338 1368 1398 

EEEIAAIEAQVIKEWEGVKFAEESPFPDMSVAFEDVFVD*N1^K*MRFISFY^^ 

:|= II :| = =1= I 1=1=11=! I = 11=1 
- - KLSDIEQRVSES I EKAVS FSEDS PYPKDSELLTD VYVSYEKGGM 
5 300 310 320 330 

SEQ ID 8798 (GBS403) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 2; MW 64.4kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 76 (lane 4; MW 39.5kDa). 

GBS403-GST was purified as shown in Figure 218, lane 6. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1394 

A DNA sequence (GBSxl479) was identified in S.agalactiae <SEQ ID 4277> which encodes the amino 
acid sequence <SEQ ID 4278>. This protein is predicted to be ABC transporter. Analysis of this protein 
15 sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 2464 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9775> which encodes amino acid sequence <SEQ ID 9776> 
25 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:CAB12414 GB:Z99107 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 328/643 (51%) , Positives = 443/643 (68%) , Gaps = 9/643 (1%) 

30 

Query: 9 MIILQGNKIERSFSGDVLFDNINIQVDQRDR1ALVGRNGAGKSTLLKILVGEEAPTKGEI 68 

M+ILQ N++ +SF D + +NI ++V RDRIA+VGRNGAGKSTLLKI+ G+ + KGEI 
Sbjct: 1 MMILQANQLSKSFGADTII^IKLEVRNRDRIAIVGRNGAGKSTLLKIIAGQLSYEKGEI 60 

35 Query: 69 NKKRDLSLSYLAQDSRFQSENTIFQEMLQVFDSLREVEKRLRELELQMGQVSGSDLEQLM 128 

K +D+++ YLAQ + S+ TI +E+L VFD L+ +EK +R +E +M +LE +M 

Sbjct: 61 IKPKDITMGYIAQHTGLDSKLTIKEELLWFDHLKAMEKEMRAMEEKMAAADPGELESIM 120 

Query: 129 KTYDILSEEFREKGGFTYESDIKAIIiNGFKFNSDIWEMPISELSGGQNTRLALAKMLLEK 188 
40 KTYD L +EF++KGG+ YE+D++++L+G F+ + LSGGQ TRLAL K+LL + 

Sbjct: 121 KTYDRLQQEFKDKGGYQYEADVRSVLHGLGFSHFDDSTQVQSLSGGQKTRLALGKLLLTQ 180 

Query: 189 PELLVLDEPTNHLDIDTIAWLENYLVNYQGALIWSHDRYFLDKVATVTYDLTTHSLDRY 248 
P+LL+LDEPTNHLDIDT+ WLE+YL Y GA++IVSHDRYFLDKV Y+++ +Y 
45 Sbjct: 181 PDLLILDEPTNHLDIDTLTWLEHYLQGYSGAILIVSHDRYFLDKVVNQVYEVSRAESKKY 240 

Query: 249 VGNYSKFMDLKAEKIATEEKNFEKQQKEIAKLEDFVQRNIVRASTTKRAQARRKQLEKME 308 

GNYS ++D KA + + K +EKQQ EIAKL+DFV RN+ RASTTKRAQ+RRKQLE+M+ 
Sbjct: 241 HGNYSAYLDQKAAQYEKDLKMYEKQQDEIAKLQDFVDRNLARASTTKRAQSRRKQLERMD 300 



50 



Query: 309 RLDKPNVEQKSANMTFHAGKVSGNVVLTLENAAIGYEG-VSLSEPIDLDVKKFDAIAIVG 367 

+ KP ++KSAN F K SGN VL +++ I YE L + + + ++ A+VG 
Sbjct: 301 VMSKPLGDEKSANFHFDITKQSGNEVLRVQDLTISYENQPPLLTEVSFMLTRGESAALVG 360 



55 



Query: 368 PNGIGKSTLIKSLVGQIPFIKGEAKLGANvETGYYDQSQSNLTKTNTVLDELWDAFSTTP 427 
PNGIGKSTL+K+L+ + +G G+NV GYYDQ Q+ LT + VLDELWD + P 
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Sbjct: 361 PNGIGKSTLLKTL1DTLKPDOGTISYGSNVSVGYXDQEQAELTSSKRVLDELWDEYPGLP 420 

Query: 428 EVEII^LGAFLFSGDDVKKSVSMLSGGERARLLLAKLSMENNNFLILDEPTNHLDIDSK 487 

E EIR LG FLFSGDDV K V LSGGE+ARL LAKL ++ NFLILDEPTNHLD+DSK 
Sbjct: 421 EKEIRTCLGNFLFSGDDVLKPVHSLSGGEKftRLALAKLMLQKANFLILDEPTNHLDLDSK 480 

Query: 488 EVLENALIEFDGTLLFVSHDRYFINRVATKVLEISDKGSTLYLGDYDYYLTKKAELEELA 547 

EVLENALI++ GTLLFVSHDRYFINR+AT+VLE+S YLGDYDYY KK E EL 

Sbjct: 481 EVLENALIDYPGTLLFVSHDRYFINRIATRVLELSSSHIEEYLGDYDYYTEKKTEQLELE 540 

Query: 548 RLNEEEVSASKTEIDVTSD YETQKANQKEFRKITRRWEIEARLEVLENDENNING 603 

++N++E KT V SD YE +K +K+ R+ RR+ EIE ++ +E + + + 
Sbjct: 541 KMNQQE - ETDKTPATVKSDSKRS YEEEKEWKKKERQRLRRI EE I ETTVQTI EENI SRNDE 599 

15 Query: 604 LMLET NDIGKLSDLQKELESIQEEQLLLMEEWENLNMRLD 643 

L+ + D K+ + + E + +E L+ EWE L+ D 

Sbjct: 600 LLCDPEVYQDHEKVQAIHADNEKLNQELESLLSEWEELSTEED 642 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4279> which encodes the amino acid 
20 sequence <SEQ ID 4280>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 2 042 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

: An alignment of the GAS and GBS proteins is shown below. 

30 Identities, = 473/635 (74%) , Positives = 545/635 (85%) , Gaps = 1/635 (0%) 

Query: 9 MIILQGNKIERSFSGDVLFDNINIQvDQRDRIALVGRNGAGKSTLLKILVGEEAPTKGEI 68 

MIILQGNK+ERSFSGDVLF NI++QVD+RDRIALVG NGAGKSTLLK+LVGEE PT GE+ 
Sbjct: 1 MIILQGNKLERSFSGDVLFQNISLQVDERDRIALVGPNGAGKSTLLKLLVGEETPTSGEV 60 

35 

Query: 69 NKKRDLSLSYLAQDSRFQSENTIFQEMLQVFDSLREVEKRLRELELQMGQVSGSDLEQLM 128 

N K+DL+LSYLAQ+SRF+S+ TI++EML+VF++LR+ EKRLR++E+ M VSG L +LM 
Sbjct: 61 NTKKDLTLSYLAQNSRFESDQTIYEEMLKVFEALRQDEKRLRQMEMDMATVSGQVLTRLM 120 

40 Query: 129 KTYD I LSEEFREKGGFTYESD I KAI LNGFKFNSDMWEMPI SELSGGQNTRLALAKMLLEK 188 

YD+L+E FR++GGFTYESDIKAILNGFKF+ MW+M I+ELSGGQNTRLALAKMLLEK 
Sbjct: 121 TDYDLLTEHFRQQGGFTYESDIKAIIiNGFKFDESMWQMTIAELSGGQNTRLALAKMLLEK 180 

Query: 189 PELLVLDEPTNHLDIDTIAWLENYLVNYQGALIIVSHDRYFLDKVATVTYDLTTHSDDRY 248 
45 1 ' PELLVLDEPTNHLDI+TIAWLENYL NYQGALI IVSHDRYFLDKVATVT DLT + LDRY 

Sbjct: 181 PELLVLDEPTNHLDIETIAWLENYLANYQGAL 1 1 VSHDRYFLDPCVATVTLDLTPNGLiDRY 240 

Query: 249 VGNYSKF^LKAEKIATEEKNFEKQQKEIAKLEDFVQRNIVRASTTKRAQARRKQLEKME 308 
GNYS+FM LKAEK+ EEK F+KQQKEIAKLEDFVQ+NIVRASTTKRAQARRKQLEK+E 
50 Sbjct: 241 SGNYSRFMALKAEKLVAEEKQFDKQQKEIAKLEDFVQKNIVRASTTKRAQARRKQLEKIE 300 

Query: 309 RLDKPNVEQKSANMTFHAGKVSGNWLTLENAAIGYEGVSLSEPIDLDVKKFDAIAIVGP 368 

RLDKP +KSA+MTFHA K SGNWL +E AAIGY LSEPI++D+ K DAIA+VGP 
Sbjct: 301 RLDKPTGGRKSAHMTFHAEKPSGJWvLRvEEAAIGYGDQVLSEPINVDINKLDAIAWGP 360 



55 



Query: 369 NGIGKSTLIKSLVGQIPFIKGEAIOjGAl^TGYYDQSQSNLTKTNTVLDELWDAFSTTPE 428 

NGIGKSTLIKS++GQ+P +KG+ K GANVETGYYDQ+QS+LT +NTVL+ELW FSTTPE 
Sbjct: 361 NGIGKSTLIKSIIGQLPLLKGQLKYGffiNVETGYYDQTQSHLTSSNTVLEELWQDFSTTPE 420 



60 Query: 429 VEIRNRLGAFLFSGDDVKKSVSMLSGGERARLLLAKLSMENNNFLILDEPTNHLDIDSKE 488 

V+IRNRLGAFLFSGDDVKKSV+MLSGGE+ARLLLAKLSMENNNFL+LDEPTNHLDIDSKE 
Sbjct: 421 VDIRNRLGAFLFSGDDVKKSVAMLSGGEKARLLLAKLSMENNNFLVLDEPTNHLDIDSKE 480 



Query: 489 VLENALIEFDGTLLWSHDRYFINRVATKVLEISDKGSTLYLGDYDYYLTKKAELEELAR 548 
65 VLENALI+FDGTLLFVSHDRYFINR+ATKVLEI++ GSTLYLGDYDYYL KKAELEELAR 
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Sbjct: 481 VLENALIDFDGTLLWSHDRYFINRIATKVLEITENGSTLYLfiDYDYYLEKKRELEELAR 540 

Query: 549 LNEEEVSASKTEIDVTSDYETQKANQKEFRKITRRVVEIEARLEVLENDENNINGLMLET 608 

L E E T DY+ QKANQKE R++TRR EIEftRLE +E I M + 

Sbjct: 541 LAAGETVEETKEASAT-DYQLQKflNQKERRRLTRRYEEIEARLETIEERIGAIQEDMHAS 599 

Query: 609 ND IGKLSDLQKELES I QEEQLLLMEEWENLNMRLD 643 

ND +L QKE + + +EQ LMEEWE + +++ 
Sbjct: 600 NDTAQLIAWQKEWDQLDQEQEALMEEWETIAEQIE 634 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1395 

A DNA sequence (GBSxl480) was identified in S.agalactiae <SEQ ID 4281> which encodes the amino 
15 acid sequence <SEQ ID 4282>. This protein is predicted to be thiophene degradation protein F (thdF). 
Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N- terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 0876 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 A related GBS nucleic acid sequence <SEQ ID 9773> which encodes amino acid sequence <SEQ ID 9774> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4283> which encodes the amino acid 

sequence <SEQ ID 4284>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0795 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 384/458 (83%) , Positives = 427/458 (92%) 

40 Query: 12 MSITKEFDTIAAISTPLGEGAIGIVRISGTDALKIASKIYRGKDIiSAIQSHTLNYGHIVD 71 

MSITKEFDTI AISTPLGEGAIGIVR+SGTDAL IA +++GK+L + SHT+NYGHI++ 
Sbjct: 1 MSITKEFDTITAISTPLGEGAIGIVRLSGTDALAIAQSVFKGKNIiEQVASHTINYGHIIN 60 

Query: 72 PDKNEILDEVMLGvMLAPKTFTREDVIEINTHGGIAVTNEILQLILRHGARMAEPGEFTK 131 
45 p I+DEVM+ VMLAPKTFTRE+V+EINTHGGIAVTNEILQL++R GARMAEPGEFTK 

Sbjct: 61 PKTGTIIDEVMVSVMIAPKTFTRENVVEINTHGGIAVTNEILQLLIRQGARMAEPGEFTK 120 

Query: 132 RAFLNGRVDLTQAEAVMDLIRAKTDKA^IAvKQDTC 191 
RAFLNGRVDLTQAEAVMD+ IRAKTDKflM IAVKQLDGSL LIN+TRQEILNTLAQVEVN 
50 Sbjct: 121 RAFLNGRVDLTQAFJWITOIIRAKTDKAOTIAVKQIJIGSLSQLINDTRQEIIjOTI^QvEW 180 

Query: 192 IDYPEYDDVEEMTTTLMREKTQEFQALMENLLRTARRGKILREGLSTAIIGRPNVGKSSL 251 

IDYPEYDD VEEMTT L+REKTQEFQ+L+E+LLRTA+RGKILREGLSTAI IGRPNVGKSSL 
Sbjct: 181 IDYPEYDDVEEMTTALLREKTQEFQSIjLESLI.RTAKRGKILREGLSTAIIGRPNVGKSSL 240 



55 



Query: 252 Ii^LREEKAIVTDIEGTTRDVIEEYVNIKGVPLKLVDTAGIRDTDDIVEKIGVERSKKA 311 

LNNLLRE+KAIVTDI GTTRDVIEEYVNIKGVPIiKLVDTAGIR+TDD+VE+IGVERSKKA 
Sbjct: 241 LNNIiLREDKAIOTDIAGTTRDVIEEYVNIKGVPLKLVDTAGIRETDDLVEQIGVERSKKA 300 
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Query: 312 LEEADLVLLVIMSSEPI.TLQDRSLLELSKESNRIVLLNKTDLPQKIEVNEIjPKNVIPISV 371 

L+EADLVLLVLN+SE LT QDR+LL LS++SNR1+LLNKTDL QKIE+ +LP + IPISV 
Sbjct: 301 LQEADLVIjLVLNASEKLTDQDRALLNLSQDSNRI ILLNKTDLEQKIELEQLPDDYI PI SV 360 

5 

Query: 372 LENENIDKIEERI^IFFDNAGMVEHDATYLSNARHISLIEKAVDSLKAVNEGLELGMPV 431 

L N+NI+ IE+RIN +FFDNAG+VE DATYLSNARHI SLIEKAV SL+AVN+GL LGMPV 
Sbjct: 361 LTNQNINLIEDRINQLFFDNAGLVEQDATYLSNARHISLIEKAVQSLEAVNDGIiALGMPV 420 

10 Query: 432 DLLQVDMTRTWEILGEITGDAAPDELITQLFSQFCLGK 469 

DLLQVD+TRTWEILGEITGDAAPDELITQLFSQFCLGK 
Sbjct: 421 DLLQVDLTRTWEILGEITGDAAPDELITQLFSQFCLGK 458 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1396 

A DNA sequence (GBSxl481) was identified in S.agalactiae <SEQ ID 4285> which encodes the amino 
acid sequence <SEQ ID 4286>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
20 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.18 Transmembrane 280 - 296 ( 276 - 299) 
INTEGRAL Likelihood = -4.83 Transmembrane 249 - 265 ( 243 - 266) 

Final Results 

25 bacterial membrane Certainty=0 . 4673 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

30 , >GP:AAD40365 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 

Identities = 88/306 (28%) , Positives = 149/306 (47%) , Gaps = 17/306 (5%) 

Query: 1 MIVEQKFGNGFTWIN IEAEQLRTETSEIQAKY-LDSEIITYALDDYERAFMECSHIK 56 

MI +K NG WI I AE+ T ++ +Y +D +11 Y D+ E I 
35 Sbjct: 1 MIKPEKTINGTKWIETIQINAEERAT LEDQYGIDEDI IEYVTDNDESTNYVYD- IN 55 

Query: 57 GKEVLTIIFNTIDLKQKESYYETVPMTFCLSHDRLITVTRSRNSYMLELLQKYLDRNPDV 116 
+ LI L+ YTP L LT+S +LLD NP+V 

' Sbjct: 56 EDDQLFIFLAPYALDKDALRYITQPFGMLLHKGVLFTFNQSGIPEVNTALYSALD-NPEV 114 

40 

Query: 117 -SPKKFLFAALTLITKQYFNWSKIDREKDILNRQLREQTTNKRLLAMSDLETGSVYLLT 175 

S F+ L + + + I ++++ L++ L +T N L+++S L+ +L + 
Sbjct: 115 KSVDAFILETLFTVWSFIPISRAITKICRNYLDKMLNRKTICNSDLVSLSYLQQTLTFLSS 174 

45 Query: 176 AANQNALVLEQLDVHPSQRFNSEVEKEQLS DALIEAHQLVSMTQLNSQVLSQLSSTF 232 

A N L +LD P F +++++ D IE Q+ M ++ +QV+ ++ T 
Sbjct: 175 AVQTN LSELDRLPKTHFGVGADQDKIDLFEDVQIEGEQVQRMFEIETQWDRIDHTL 231 

Query: 233 NNVlNNNIiNENLTGI^IISINIAIIAAITGFFGMNIPLPLTESRSSWLIVIATSVLLWVI 292 
50 N++ NNNLN+ + L I S+ +A+ I+GF+GMN+ LPL + +W++ + SV+L V 

Sbjct: 232 NSLANNNLNDTMKFLTIWSLTMAVPTIISGFYGNMWCLPIAGMQYAWMLTLGISVVLIVA 291 

Query: 293 IAQILK 298 
+ +LK 

55 , Sbjct: 292 MLIMLK 297 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1397 

A DNA sequence (GBSxl482) was identified in S.agalactiae <SEQ ID 4287> which encodes the amino 
acid sequence <SEQ ID 4288>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1437 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1398 

A DNA sequence (GBSxl483) was identified in S.agalactiae <SEQ ID 4289> which encodes the amino 

acid sequence <SEQ ID 4290>. This protein is predicted to be exonuclease RexA. Analysis of this protein 

sequence reveals the following: 

20 Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3165 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 977 1> which encodes amino acid sequence <SEQ ID 9772> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC12966 GB:U76424 exonuclease RexA [Lactococcus lactis] 
Identities = 522/1211 (43%) , Positives = 747/1211 (61%) , Gaps = 73/1211 (6%) 

KRTPEQIEAIYTFGNNVLVSASAGSGKTFVTWERILDKLLRGVPIDSLFISTFTVKAAGE 87 
35 K TPEQ EAI++ G N+LVSASAGSGKTFVM +RI++K+ +G+ ID LFISTFT KAA E 

KLTPEQNEAIHSSGKNILVSASAGSGKTFVMAQRIVEKVKQGIEIDRLFISTFTKKAASE, 64 

LKERLEKKINESLKSAESDDLKQFLTQQLVGIQTADIGTMDAFTQKIVNQYGYTLGISPI 147 
L+ RLE+ + ++ + + D+ LT L + ADIGTMD+FTQK+ + I P 

40 Sbjct: 65 LRMRLERDLKKARQESSDDEEAHRLTLALQNLSNADIGTMDSFTQKLTKANFNRVNIDPN 124 

FRILQDKNEQDVIKNE VYADLFSDYMTGKNAAS FIKLVKNFSGNRKDSKAFREMV 202 

FRIL D+ E D+I+ EV+ L Y++ + + F KL+KNFS +R + F+++V 



45 



50 



Query: 


28 


Sb j ct : 


5 


Query: 


88 


Sbjct: 


65 


Query: 


148 


Sbjct: 


125 


Query: 


203 


Sbjct: 


184 


Query: 


261 


Sb j ct : 


241 


Query: 


320 



Y +Y F+ +T+NP W++ FLKG +TY +++ D +NV + T +L + 



++D+ TA Ii I ++ V S+D L KK + +D+ 
KKDFVTCTAL FLSIDTDIRVGSSKDEALSALKKDFSAQKQDL 282 

Query: 320 VWAGVKYPIFKQLHNRIVGLKHLEVIFKYQGESLFLLELLQSFVLDFSEQYLQEKIQEN 379 
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V P +L + +KH ++I KYQ ++ + LQ F++DF + YL+ K EN 
Sbjct: 283 - -VGSKSKP- -GELRKFVDKIKHGQLIEKYQNQAFEIASDLQKFIIDFYKTYLERKKNEN 338 

Query: 380 AFEFSDIAHFAIQILEEISHDIRQLYQDKYHEVMVDEYQDNNHTQERMLELLSNGHNRFMV 439 
5 AFE+SDIAHFAI + ILEEN DIR+ ++ Y E+M+DEYQD +HTQERMLELLSNGHN FMV 

Sbjct: 339 AFEYSDIAHFAIEILEENPDIRENLREHYDEIMIDEYQDTSHTQERMLELLSNGHNLFMV 398 

Query: 440 GDIKQSIYRFRQADPQIFNDKYKAYQDNPSQGKLIILKENFRSQSEVLDSTNSVFTHLMD 499 
GDIKQSIY FR ADP +F +KYK+Y + +LI LKENFRS+ EVL+ TN +F HLMD 
10 Sbjct: 399 GDIKQSIYGFRLADPGLFLEKYKSYDQAENPNQLIRLKENFRSRGEVLNFTNDIFKHLMD 458 

Query: 500 EEVGDILYDESHQLKAGS PRQQERHPNNKTQVLLLDTDEDDI DDSDSQQYDI S PAE 555 

E++G++ Y + L G+ P + E+ + + +T E++I+DS+ + IS E 
Sbjct: 459 EKLGEMTYGKEEALVQGNI SDYPVEAEKDFYPELLLYKENTSEEE I EDSEVK ISDGE 515 

15 

Query: 556 AKLVAKEIIRLHKEENVPFQDITLLVSSRTRNDGILQTFDRYGIPLVTDGGEQNYLKSVE 615 

K A+EI +L E V +DI +LV S++ N+ I Y IP+V D G ++LKS+E 

Sbjct: 515 1 KGAAQE I KKL - IEYGVEPKD IAI LVRSKSNNNKIED I LLS YD I PWLDEGRVDFLKSME 574 

20 Query: 616 VMVMLDTLRS IDNPLNDYALVALLRS PMFGFNEDDLTRI AI QDVK- MAFYHKVKLSYHKE 674 

V++MLD LR+IDNPL D +LVA+LRSP+FGFNED+LTRI++Q + + F+ K+ LS KE 
Sbjct: 575 VLIMLDVLRAIDNPLYDLSLVAMLRSPLFGFNEDELTRISVQGSRDLRFWDKILLSLKKE 634 

Query: 675 GHHSDLITPELSSKIDHFMKTFQTWRDFAKWHSLYDLIWKIYNDRFYYDYVGALPKAEQR 734 
25 G + +LI L K+ F + F WR ++ L+WKIY + +Y+DYVGAL E R 

Sbjct: 635 GKNPELINLSLEQKLKAFNQKFTEWRKLVNKIPIHRLLWKIYTETYYFDYVGALKNGEMR 694 

Query: 735 QANLYALALRANQFEKTGFKGLSRFIRMIDKVLENElSroLADVEVALPQNAVNLMTIHKSK 794 
QANL AL++RA +E +G+KGL +F+R+I+K +E NDLA V + LPQNAV +MT HKSK 
30 Sbjct: 695 QANLQALSVRAESYESSGYKGLFKFVRLINKFMEQNNDLASVNIKLPQNAVRVMTFHKSK 754 

Query: 795 GLEFKYVFILNIDKKFSMTOITSPLILSRNQGIGIKYVADMRHELEE-EILPAVKVSMET 853 

GLEF YVF++N+ +F+ D+ +ILSR G+G+KY+AD++ E + P V MET 

Sbjct: 755 GLEFDYVFLMNLQSRFNDRDLKEDVILSREHGLGMKYIADDKAEPDVITDFPYALVKMET 814 

35 

Query: 854 LPYQLNKRELRLATLSEQMRLLYVAMTRAEKKLYLVGKASQT KWADHYDLVS-ENNH 909 

PY +NK + A LSE+MR+LYVA TRA+ KKLYLVGK T + YD + E 

Sbjct: 815 FPYMVNKDLKQFJiALSEEMRVLYVAFTRAKKKLYLVGKIKDTDKKAGLELYDAATLEGKI 874 

40 Query: 910 LPLASRETFVTFQDWLLAVHETYKKQELFYDINFVSLEELTDHHIGMVNPSLPFNPDNK- 968 
L R + FQ W+LA+ K L +N + +EL + + PD K 

Sbjct: 875 LSDKFRNSSRGFQHWILALQNATK LPMKLNVYTKDELETEKLEFTS QPDFKK 926 

Query: 969 -VENRQSEDIVRAIS- -VLESVEQINQTY--KAAIELPTVRTPSQVKK-IYEPILDIEGV 1022 
45 VE + D + + S+E+ + +NY +AA EL +++TPSQVKK YE L + V 

Sbjct: 927 LVEESEKFDNIMSFSDEIKEAQKIMNYQYPHQAATELSSIQTPSQVKKRSYEKQLQVGEV 986 

Query: 1023 D-VMETITKTSVDFKLPDFSTSKKQDPAALGSAVHELMQRIEMSSHVKMEDIQKALTEVN 1081 

V E + ++DF DF KK A +GSA H MQ + S + Q L E+ 

50 Sbjct: 987 QPVSEFVRVKNLDFS - - DFG - PKKITAAEMGSATHSFMQYADF - SQADLFS FQATLDEMG 1042 

Query: 1082 AETSVKAAIQIEKINYFFQETSLGKYIQEEVEHLHREAPFAMLKEDPESGEKFVVRGIID 1141 

+ +K I I KI F +T G+++ E V+ +EAPF+ML+ D + E+++VRGI D 
Sbjct: 1043 FDEKIKNQIDITKILTLF-DTEFGQFLSENVDKTVKEAPFSMLRTDEFAKEQYIVRGICD 1101 

55 

Query: 1142 GYLLLENRIILFDYKTDKFVNP LELKERYQGQMALYAEALKKSYEIEKIDKYLILLG 1198 

G++ L ++IILFDYKTD+F N E+KERY+ QM LY+EAL+K+Y + +IDKYLILLG 

Sbjct: 1102 GFVKLADKIILFDYKTDRFTNVSAISEIKERYKDQMNLYSEALQKAYHVNQIDKYLILLG 1161 

60 Query: 1199 G-KQLEWKMD 1208 

G +++ V K+D 
Sbjct: 1162 GPRKVFVEKID 1172 



65 



A related DNA sequence was identified, in S.pyogenes <SEQ ID 429 1> which encodes the amino acid 
sequence <SEQ ID 4292>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
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>>> Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC12966 GB:U76424 exonuclease RexA [Lactococcus lactis] 
10 Identities = 478/1206 (39%) , Positives = 700/1206 (57%) , Gaps = 65/1206 (5%) 

Query: 40 KRTAQQIEAIYTSGQNILVSASAGSGKTFVMVERILDKILRGVSIDRLFISTFTVKAATE 99 

K T +Q EAI++SG+NILVSASAGSGKTFVM +R1++K+ +G+ IDRLFISTFT KAA+E 
Sbjct: 5 KLTPEQNEAIHSSGKNILVSASAGSGKTFVMAQRIVEKVKQGIEIDRLFISTFTKKAASE 64 

15 

Query: 100 LRERIENKLYSQIAQTTDFQMKVYLTEQLQSLCQADIGTMDAFAQKWSRYGYSIGISSQ 159 

LR R+E L +++D + LT LQ+L ADIGTMD+F QK+ + I 

Sbjct: 65 LRmLERDLKKARQESSDDEEAHRLTlALQNLSNADIGTMDSFTQKLTKANFNRVNIDPN 124 

20 Query: 160 FRIMQDKAEQDVLKQEVFSKLFNEFMNQKEA PVFRALVKNFSGNCKDTSAFRELV 214 

FRI+ D+ E D+++QEVF +L +++ E+ F L+KNFS + ++ F+++V 

Sbjct: 125 FRILADQTESDLIRQEVFEQLVESYLSADESLNISKDKFEKLIKNFSKD-RNILGFQKVV 183 

Query: 215 YTCYSFSQSTENPKIWLQENFXjSAAKTYQRLEDIPDHDIELLLIAMQDTANQLRDVTDME 274 
25 YT Y F+ +TENP WL+ FL +TY+ LD+ + D+ + T+L+ + 

Sbjct: 184 YTIYRFASATENPISWLENQFLKGFETYKSLTDLSE-DFTVNVKENLLTFFELLEAISKK 242 

Query: 275 DYGQLTKAG- SRSAKYTKHLTI IEKLSDWVRDFKCLYGKAGLDRLIRDVTGLI PSGND VT 333 
D+ T S + E LS +DF D+ 
30 Sbjct: 243 DFVTCTALFLSIDTDIRVGS,SKDEALSALKKDFSA QKQDLV 283 

Query: 334 VSKVKYPVFKTLHQKLKQFRHLETILMYQKI)CFSLIiEQLQDFVLAFSEAYIAVKIQESAF 393 

SK K + K+K H + I YQ F + LQ F++ F + YL K E+AF 
Sbjct: 284 GS KSKPGELRKFVDKI K HGQLIEKYQNQAFEIASDLQKFI IDFYKTYLERKKNENAF 340 

35 

Query: 394 EFSDIAHFAIKILEENTDIRQSYQQHYHEVMVDEYQDNNHMQERLLTLLSNGHNRFMVGD 453 

E+SDIAHFAI+ILEEN DIR++ ++HY E+M+DEYQD +H QER+L LLSNGHN FMVGD 
Sbjct: 341 EYSDIAHFAIEILEENPDIRENLREHYDEIMIDEYQDTSHTQERMLELLSNGHNLFMTCD 400 

40 Query: 454 IKQSIYRFROADPQIFNQKFRDYQKKPEQGK^ILLKENFRSQSEVLNVSNAVFSHLMDES 513 

IKQSIY FR ADP +F +K++ Y + ++I LKENFRS+ EVLN +N +F HLMDE 

Sbjct: 401 IKQSIYGFRLADPGLFLEKYKSYDQAENPNQLIRLKENFRSRGEVLNFTNDIFKHLMDEK 460 

Query: 514 VGDVLYDEQHQLIAG- -SHAQTVPYDDRRAQLLLYNSDKDDGNAPSDSEGISFSEVTIVA 571 
45 +G++ Y ++ b+ G S D +LLLY + + IS E+ A 

Sbjct: 461 LGEMTYGKEEALVQGNISDYPVEAEKDFYPELLLYKENTSEEEIEDSEVKISDGEIKGAA 520 

Query: 572 KEIIKLHNDKGVPFEDITLLVSSRTRNDIISHTFNQYGIPIATDGGQQNYLKSVEVMVML 631 
+EI KL + GV +DI +LV S++ N+ I Y IP+ D G+ ++LKS+EV++ML 

50 Sbjct: 521 QEIKKL-IEYGVEPKDIAILTOSKSNNNKIEDILLSYDIPVVLDEGRVDFL.KSMEvIjIML 579 

Query: 632 DTLRTINNPRNDYALVALLRSPMFAFDEDDLARIALQKDNELDKDCLYDKIQRAVIGRGA 691 

D LR I+NP D +LVA+LRSP+F F+ED+L RI++Q +L +DKI ++ G 

Sbjct: 580 DVLRAIDNPLYDLSLVAMLRSPLFGFNEDELTRISVQGSRDLR FWDKILLSLKKEGK 636 

55, 

Query: 692 HPELIHDTLLGKI^FLKTLKSWRRYAKLGSLYDLIWKIFNDRFYFDFVASQAKAEQAQA 751 

+PELI+ +L KL F + WR+ ++ L+WKI+ + +YFD+V + E QA 

Sbjct: 637 NPELINLSLEQKLKAFNQKFTEWRKLVNKIPIHRLLWKIYTETYYFDYVGALKNGEMRQA 696 

60 Query: 752 NLYALALRANQFEKSGYKGLYRFIKMIDKVLETQNDLADVEVATPKQAVNLMTIHKSKGL 811 

NL AL++RA +E SGYKGL++F+++I+K +E NDLA V + P+ AV +MT HKSKGL 
Sbjct: 697 NLQALSVRAESYESSGYKGLFKFVRLINKFMEQNNDIASVNIKLPQNAVRVMTFHKSKGL 756 

Query: 812 QFPYVFIUJCDKRFSMTDIHKSFIIfflRQHGlGIKYLADIKGLLGE-TTLNSVKVSMETLP 870 
65 +F YVF++N RF+ D+ + IL+R+HG+G+KY+AD+K T V MET P 

Sbjct: 757 EFDYVFLMNLQSRFNDRDLKEDVILSREHGLGMKYIADLKAEPDVITDFPYALVKMETFP 816 
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Query: 871 YQIJsrKQELRLATLSEEMRLLYVflMTRflEKKVYFIGK ASKSKSQEITDPKKL-GKLLP 926 

Y +NK + A LSEEMR+LYVA TRA+KK+Y +GK K E+ D L GK+L 

Sbjct: 817 YMVNKDLKQRAALSEEmVLYVAFTRAKKKLYLVGKIKDTDKKAGLELYDAATLEGKILS 876 

5 Query: 927 LALREQLLTFQDWLLAIADIFSTEDLYFDVRFIEDSDLTQESVGRLQTP---QLIjNPDDL 983 

R FQ W+LA+ + L + +L E + P +L+ + 

Sbjct: 877 DKFRNSSRGFQHWIIiALQ NATKLPMKIiNVYTKDELETEKLEFTSQPDFKKLVEESEK 933 

Query: 984 KDNRQSETIARALDMLEAVSQIiNANY--EAAIHLPTVRTPSQL-KATYEPLLEPIGVDII 1040 
10 DN S + ++ EA +N Y +AA L +++TPSQ+ K +YE L+ V + 

Sbjct: 934 FDNIMSFSD E I KEAQKIMNYQYPHQAATELSS I QTPSQVKKRSYEKQLQVGEVQPV 989 

Query: 1041 EKSSRSLSDFTLPHFSKKAKVEASHIGSALHQLMQVLPLSKP - - INQQTLLDALRGIDSN 1098 
+ R + + F K K+ A+ +GSA H MQ S+ + Q LD + G D 

15 Sbjct: 990 SEFVR-VKNLDFSDFGPK-KITAAEMGSATHSFMQYADFSQADLFSFQATLDEM-GFD-- 1044 

Query: 1099 EEVKTALDLKKIESFFCDTSLGQFFQTYQKHLYREAPFAILKLDPISQEEYVLRGIIDAY 1158 

E++K +D+ KI + F DT GQF +EAPF++L+ D ++E+Y++RGI D + 

Sbjct: 1045 EKIKNQIDITKILTLF-DTEFGQFLSENVDKTVKEAPFSMLRTDEFAKEQYIVRGICDGF 1103 

20 

Query: 1159 FLFDDHIVLVDYKTDKYKQP IELKKRYQQQLELYAEALTQTYKLPVTKRYLVLMGGG 1215 

D I+L DYKTD++ E+K+RY+ Q+ LY+EAL + Y + +YL+L+GG 

Sbjct: 1104 VKLADKI1LFDYKTDRFTNVSAISEIKERYKDQMNLYSEALQKAYHVNQIDKYLILLGGP 1163 

25 Query: 1216 KPEIVE 1221 

+ VE 
Sbjct: 1164 RKVFVE 1169 

An alignment of the GAS and GBS proteins is shown below. 
30 Identities = 728/1211 (60%), Positives = 916/1211 (75%), Gaps = 5/1211 (0%) 

Query: 1 MMTFKPFLNPEDIAVIQ/TEEKNSDKI^KRTPEQIEAIYTFGMWLViSASAGSGKTFVMVE 60 

+++F PFL+PE I +Q E+ D+ QKRT +QIEAIYT G N+LVSASAGSGKTFVMVE 
Sbjct: 13 VISFAPFLSPEAIKHLQENERCRDQSQKRTAQQIEAIYTSGQNILVSASAGSGKTFVMVE 72 

35 

Query: 61 RILDKLLRGVPIDSLFISTFTVKAAGELKERLEKKINESLKSAESDDLKQFLTQQLVGIQ 120 

RILDK+LRGV ID LFISTFTVKAA EL+ER+E K+ + +K +LT+QL + 

Sbjct: 73 RILDKILRGVSIDRLFISTFTVKAATELRERIENKLYSQIAQTTDFQMKVYLTEQLQSLC 132 

40 Query: 121 TADIGTMDAFTQKIVNQYGYTLGISPIFRILQDKNEQDVIKNEVYADLFSDYMTGKNAAS 180 

ADIGTMDAF QK+V++YGY++GIS FRI+QDK EQDV+K EV++ LF+++M K A 
Sbjct: 133 QADIGTMDAFAQKWSRYGYSIGISSQFRIMQDKAEQDVLKQEVFSKLFNEFMNQKEAPV 192 

Query: 181 FIKLVKNFSGNRKDSKAFRE^^VYKVYAFSQSTDNPKRWMQTVFLKGAQTYTDFEAIPDQE 240 
45 F LVKNFSGN KD+ AFRE+VY Y+FSQST+NPK W+Q FL A+TY E IPD + 

Sbjct: 193 FRALVKNFSGNCKDTSAFRELVYTCySFSQSTENPKIWLQENFLSAAKTYQRLEDIPDHD 252 

Query: 241 VSSLLNVMQTTANQLRDLTDQEDYKQLTAKGVPTANYKKHLKIIENLVHWSQDFNLLYGK 300 
+ LL MQ TANQLRD+TD EDY QLT G +A Y KHL HE L W +DF LYGK 
50 Sbjct: 253 IELLLLAMQDTANQLRDVTD^DYGQLTKAGSRSAKYTKHLTIIEKLSDWVRDFKCLYGK 312 

Query: 301 KGLTNIARDITWIPSGNDVTVAGVKYPIFKQLHNRIVGLKHLEVIFKYQGESLFLLELL 360 

GL L RD+T +IPSGNDVTV+ VKYP+FK LH ++ +HLE I YQ + LLE L 
Sbjct: 313 AGLDRLIRDOTGLIPSGNDVTVSKVKYPVFKTLHQKLKQFRHLETILMYQKDCFSLLEQL 372 

55 

Query: 361 QSFVLDFSEQYLQEKIQENAFEFSDIAHFAIQILEENHDIRQLYQDKYHEVMVDEYQDNN 420 

Q FVL FSE YL KIQE+AFEFSDIAHFAI+ILEEN DIRQ YQ YHEVMVDEYQDNN 
Sbjct: 373 QDFVLAFSEAYIAVKIQESAFEFSDIAHFAIKILEENTDIRQSYQQHYHEVMVDEYQDNN 432 

60 Query: 421 HTQERMLELLSNGHNRFMVGDI KQS I YRFRQADPQI FNDKYKAYQDNPSQGKLI I LKENF 480 

H QER+L LLSNGHNRFMVGDIKQSIYRFRQADPQIFN K++ YQ P QGK+I+LKENF 
Sbjct: 433 HMQERLLTLLSNGHNRFMVGDIKQSIYRFRQADPQIFNQKFRDYQKKPEQGKVILLKENF 492 

Query: 481 RSQSEVLDSTNSVFTHLMDEEVGDILYDESHQLKAGSPRQQERHPNNKTQVLLLDTDEDD 540 
65 RSQSEVL+ +N+VF+HLMDE VGD+LYDE HQL AGS Q + + + Q+LL ++D+DD 

Sbjct: 493 RSQSEVLNVSNAVFSHLMDESVGDVLYDEQHQLIAGSHAQTVPYLDRRAQLLLYNSDKDD 552 
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Query: 541 IDDSDSQQYDISPAEAKLVAKEIIRLHKEENVPFQDITLLVSSRTRNDGILQTFDRYGIP 600 

++ S IS +E +VAKEII+LH ++ VPF+DITLLVSSRTRND I TF++YGIP 
Sbjct: 553 -GNAPSDSEGISFSEVTIVAKEIIKLHNDKGVPFEDITLLVSSRTRNDIISHTFNQYGIP 611 

5 Query: 601 L VTDGGEQNYLKS vEVMVMIiDTLRS I DNPLNDYALVALLRS PMFGFNEDDLTRIAIQD - - 658 

+ TDGG+QNYLKSVEVMVMLDTLR+I+NP NDYAIiVALIiRSPMF F+EDDL RIA+Q 
Sbjct: 612 IATDGGQQNYLKSVEVMVMLDTLRTINNPRNDYALVALLRSPMFAFDEDDLARIALQKDN 671 

Query: 659 --VKMAFYHKVKLSYHKEGHHSDLITPELSSKIDHFMKTFQTWRDFAKWHSLYDLIWKIY 716 
10 K Y K++ + G H +LI L K++ F+KT ++WR +AK SLYDLIWKT+ 

Sbjct: 672 ELDKDCLYDKIQRAVIGRGAHPELIHDTLLGKIJWFLKTLKSWRRYAKLGSLYDLIWKIF 731 

Query: 717 NDRFYYDYVGALPKAEQRQANLYALALRANQFEKTGFKGLSRFIRMIDK^ENENDLADV 776 
NDRFY+D+V + KAEQ QANLYALALRANQFEK+G+KGL RFI+MIDKVLE +NDLADV 
15 Sbjct: 732 NDRFYFDFVASQAKAEQAQANLYALALRANQFEKSGYKGLYRFIKMIDKVLETQNDLADV 791 

Query: 777 EVALPQNAVNLMTIHKSKGLEFKWFlIiNIDKKFSMVDITSPr.ILSRNQGIGIKYVADMR 836 

EVA P+ AVNLMTIHKSKGL+F YVFILN DK+FSM DI IL+R GIGIKY+AD++ 
Sbjct: 792 EVATPKQAVNLMTIHKSKGLQFPYVFILNCDKRFSMTDIHKSFIliNRQHGIGIKYLADIK 851 

20 

Query: 837 HELEEEILPAVKVSMETLPYQLNKRELRLATLSEQMRLLYVAMTRAEKKLYLVGKASQTK 896 ■ 

L E L +VKVSMETLPYQLNK+ELRIATLSE+MRLLYVAMTRAEKK+Y +GKAS++K 
Sbjct: 852 GLLGETTLNSVKVSMETLPYQLNKQELRLATLSEEMRLLYVAMTRAEKKVYFIGKASKSK 911 

25 Query: 897 WADHYDLVSENNHLPIASRETFVTFQDWLLAVHETYKKQELFYDINFVSLEELTDHHIGM 956 

+ D LPLA RE +TFQDWLLA+ + + ++L++D+ F+ +LT +G 

Sbjct: 912 SQEITDPKKLGKLLPLALREQLLTFQDWLIiAIADIFSTEDLYFDVRFIEDSDLTQESVGR 971 

; Query: 957 WPSLPFNPDNKVENRQSEDIVRAISVLESVEQINQTYKAAIELPTVRTPSQVKKIYEPI 1016 
30 + NPD+ +NRQSE I RA+ +LE+V Q+N Y+AAI LPTVRTPSQ+K YEP+ 

Sbjct: 972 LQTPQLIOTDDLKDNRQSETIARAUJMLEAVSQIMRNYEAAIHLPTVRTPS 1031 

Query: 1017 LDIEGVDVMETITKTSVDFKLPDFSTSKKQDPARLGSAVHELMQRIEMSSHVKMEDIQKA 1076 
L+ GVD++E +++ DF LP FS K + + +GSA+H+LMQ + +S + + + A 
35 Sbjct: 1032 LEPIGVDIIEKSSRSLSDFTLPHFSKKAKVEASHIGSALHQLMQVLPLSKPINQQTLLDA 1091 

Query: 1077 LTEVNAETSVKAAIQIEKINYFFQETSLGKYIQEEVEHLHREAPFAMLKEDPESGEKFW 1136. 

L +++ VK A+ ++KI FF +TSLG++ Q +HL+REAPFA+LK DP S E++V+ 
Sbjct: 1092 LRGIDSNEEVKTALDLKKIESFFCDTSLGQFFQTYQKHLYREAPFAILKLDPISQEEYVL 1151 

40 

Query: 1137 RGIIDGYLLLENRIILFDYKTDKFVNPLELKERYQGQMALYAEALKKSYEIEKIDKYLIL 1196 

RGIID Y L ++ I+L DYKTDK+ P+ELK+RYQ Q+ LYAEAL ++Y++ +YL+L 
Sbj Ct : 1152 RGIIDAYFLFDDHIVLVDYKTDKYKQPIEIjKKRYQQQLELYAEALTQTYKLPVTKRYLVL 1211 

45 Query: 1197 LGGKQLEWKM 1207 

+GG + E+V++ 
Sbjct: 1212 MGGGKPEIVEV 1222 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1399 

A DNA sequence (GBSxl484) was identified in S.agalactiae <SEQ ID 4293> which encodes the amino 
acid sequence <SEQ ID 4294>. This protein is predicted to be exonuclease RexB. Analysis of this protein 
sequence reveals the following: 

55 Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0660 (Affirmative) < suco 

60 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC12965 GB:U76424 exonuclease RexB [Lactococcus lactis] 
Identities = 363/1093 (33%) , Positives = 604/1093 (55%) , Gaps = 67/1093 (6%) 

Query: 1 MKLLYTDINHDMTEILVNQAAHAAEAGWRIFYIAPNSLSFEKERAVLENLPQ EASFA 57 

M++LYT+I D+TE L+ A E +++YI P+S+SFEKE+ +LE L + A F 
Sbjct: 1 MEILYTEITQDLTEGLLEIALEELEKNRKVYYIVPSSMSFEKEKEILERLAKGSDTAVFD 60 

Query: 58 ITITRFAQLARYFTMQP-NQKESLiNDIGLMIFYRALASFEDGQLKVFGRLKQDASFIS 116 

+ +TRF QL YF + K L +GL+M+F R L SF+ ++ ++ L+ A F+ 
Sbjct: 61 LLVTRFKQLPYYFDKREKATMKTELGTVGLSMLFRRVLRSFKKDEIPLYFSLQDSAGFLE 120 

Query: 117 . QLVDLYKELQTANLSILELKYLHSPEKFEDLLAIFLWSDLLREGEYDNQSKIAFFTEQV 176 

L+ L EL TANLS+ L ++ + +LA F + EY N S+ FT ++ 
Sbjct: 121 MLIQLRAELLTANLSVENLPDNPKNQELKKILAKFEAELSV EYANYSEFGDFTNRL 176 

Query: 177 RSGQLDVDLKNTILI vDGFTRFSAEEEALIKSLSSRCQEIIIGAYASQKAYKANFTNGNI 236 

G+ D LK+ + 1 +DG+TRFSAEEE I+S+ + ++G Y+ + + A + 1 
Sbjct: 177 VDGEFDQQLKDVTIIIDGYTRFSAEEELFIESIQEKVARFWGTYSDENSLTAG--SETI 234 

Query: 237 YSAGVDFLRYIATTFQTKPEFILSKWESKSGFEMISK NIEGKHDFTNSSHILDDT 291 

Y + T F+ K L K S + E+ SK +++ + T+ L 

Sbjct: 235 YVGTSQMI TRFRNKFPVELRKIASSAVNEVYSKLTRILDLDSRFVITDEKIELKAE 290 

Query: 292 AKDCITIVffiCINQKDEVEHVARAIRQKLYQGYRYKDILVLLGDVDSYKLQLSKIFEQYDI 351 

+ IWE NQK E+E VA+ IRQK+ QG +KD VL+GD +Y++ L ++F+ Y+I 
Sbjct: 291 DEKYFRIWFAENQKVEIERVAKEIRQKIIQGAFFKDFTVLVGDPAAYEITLKEVFDLYEI 350 

Query: 352 PYYFGKAETMAAHPLVHFMDSLSRIKRYRFRAEDVLNLFKTGIYGEISQDD- -LDYFEAY 409 

p +++ + E+M+ HPLV F +SL IK+ +R +DV+NL K+ +Y + + D+ +DYFE Y 
Sbjct: 351 PFFYAQEESMSQHPLVIFFESLFAIKKNNYRTDDVVNLLKSKVYTDANLDEEVIDYFEYY 410 

Query: 410 ISYADIKGPKKFFTDFWGAKKFDLGRLNTIRQSIiL TPLESFV-KTKKQDGIKTLNQ 465 

+ I G KKF +F+ ++ + +N +R+ LL +PL+ F+ +K+ G K ++ 
Sbjct: 411 VQKYKISGRKKFTEEFIE-SEFSQIELWEMREKLLGSESPLQVFLGNNRKKTGKKWVSD 469 

Query: 466 FMFFLTQVGLSDNLSRLVGQMS-ENEQE KHQEVWKTFTDILEQFQTIFGQEKLNLDE 521 

L + N++ +NE + KH++VW+ L +F +F EKL E 

Sbjct: 470 LQGLLENGNVMTNMNAYFSAAELQNEHQMADKHEQVWQML I STLNEFLAVFSDEKLKSVE 529 

Query: 522 FLSLLNSGWQAEYRWPATVDVVTVKSYDLVEPHSNQFWALGMTQSHFPKIAQNKSLI 581 

FL +L +G+ A+YR +PA VDW VK Y+LVEP +N+++YA+G++Q++FP+I +N +L+ 
Sbjct: 530 FLDILIiAGLKNAKYRQIPANVDVWVKDYELWPKTNKYIYAIGLSQTNFPRIKKNSTLL 589 

Query: 582 SDIERQLINDANDTDGHFDIMTQENLKKiraFAALSLFNAAKQELVLTIPQLLNESEDQMS 641 

SD ER IN D + + + N +KN F LSL N+AK+ LVL++PQ++ + + S 
Sbjct: 590 SDEERLEINQTTDENQFIEQLNVANYQKNQFTVLSLINSAKESLVLSMPQIMANEQGEFS 649 

Query: 642 P-YLVELRDIGVPFNHKGR-QSLKEEADNIGNYKALLSRVVDLYRSAIDKEMTKEE-QTF 698 

P + + L+D K + +L E ++IGN +++++ + +R ++ETE++F 

Sbjct: 650 PVFQLFLKDADEKILQKIQGVNLFESLEHIGNSRSVIAMIGQIERELVESEETSEDKRVF 709 

Query: 699 WSVATOYLRRQLTSKGIEIPIITDSLDTVTVSSDVMTRRFPEDDPLKLSSSALTTFYNNQ 758 

WS R L + + + +DTV ++D + + + D + SS+ FYN + 

Sbjct: 710 WSSIFRILVKSNADFQKILLDLAKDIDTVNLAPDTLEQIY- -GDKIYASVSSFERFYNCE 767 

Query: 759 YKYFLQYVLGLEEQDS IHPDMRHHGTYLHRVFEILMKNQGI - - ES FEEKLNSAINKTNQE 816 

Y+YFL+ L LE ++I + + G + H VFE +MK + E+F+EKL + + ++ 
Sbjct: 768 YQYFLEISTTLSIiETFENIDINSKIVGNFFHEVFEKVMKETDLSAENFDEKLTLVLQEVDKN 827 

Query: 817 DVFKSLYSEDAESRYSLEILEDIARATATILR QDSQMTVESE EERFELM 865 

+ +++DA +R++ LE+I R TAT+L+ D T+ +E E 

Sbjct: 828 --YSRYFTQDATARFTWSlS&EEITOQTATVLKATVSTDELKTLLTESSFGIiPKSELGNFS 885 

Query: 866 IDOTIKINGIIDRIDRLSDGSLGVTOYKSSAQKFDIQKFYNGIjSPQLVTYIDAISRDKEV 925 

+D+ I + G IDR+D+LS LG +DYKSSA F +Q+ Y+GLS Q +TY+D I K+ 
Sbjct: 886 VDD-IYLRGRIDRLDQLSTDYLGAIDYKSSAHSFKLQEAYDGLSLQFMTYLDVI KQA 941 
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10 



Query: 926 
Sbjct: 942 
Query: 985 



EQKPPIFGAMYLHMQEPRQDLSKIKNLDDLVTKNHQALTYKGLFSFAEKEFIANGKYHL- 984 

I+GA+YL + +LS+I L ++ +++ Y+GL E E + G ++ 

FPNQKIWGALYLQFKNQPINLSEINQLSEIANILKESMRYEGLVLEDAAEQI - KGIENIA 1000 



-Q 1033 



-KDSLYSETEIAILQAHNQSLYKKASETIKSGKFLINPYTEDAKTVDGD 

K ++Y+E E L N+ Y+ A + +K GK INP + ++ +D 
Sbjct: 1001 LKKTNIY^EEFEQLLKLNEEHYRAAGQRLKKGKIAINPIMKRSEGIDQSGNVRGCRYCP 1060 

Query: 1034 FKSITGFEADRHM 1046 

KSI FEA+ HM 
Sbjct: 1061 LKSICRFEANIHM 1073 



15 



20 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4295> which encodes the amino acid 
sequence <SEQ ID 4296>. Analysis of this protein sequence reveals the following: 



Possible site: 23 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 



-- Certainty=0. 1891 (Affirmative) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 546/1075 (50%) , Positives = 758/1075 (69%) , Gaps = 11/1075 (1%) 



30 



35 



40 



45 



50 



55 



60 



65 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query. 


241 


Sb j ct : 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sb j ct : 


361 


Query: 


421 


Sbjct: 


421 


Query: 


480 


Sbjct: 


481 


Query: 


540 


Sb j ct : 


539 



MKLLYTDINHDMTEILVNQAAHAAEAGWRIFYIAPNSLSFEKERAVLENLPQEASFAITI 60 
MKL+YT++++ MTEILVN+A AA+ G+R+FYIAPNSLSFEKER VL LP+ +F+I + 
MKLIYTEMSYSMTEILVNEARKAADCK3YRVFYIAPNSLSFEKEREVLTLLPERGTFSIIV 60 

TRFAQLARYFTLNQPNQKESIiNDIGLftMiFYRaLASFEDGQLKVFGRLKQDASFISQLvX) 12 0 
TRF Q++RYFT+ K+ L+D LAMIFYRAL + L +GRL+ ++ FI QLV+ 

TRFVQMSRYFTVESSPSKQHLDDTTLAMIFYRALMQLKPEDLPSYGRLQNNSVFIEQLVE 120 

LYKELQTANLSILELKYLHSPKKFEDLLAIFLWSDLLREGEYDNQSKIAFFTEQVRSGQ 180 
LYKEL+ A LS+ +L L P+K EDL+ I + ++ + +Y+ S + F ++ G 
LYKELKNAQLSVHDLTGLDHPQKQEDLIKIIELAETIMIQQDYNQDSPLQSFARAIKLGL 180 

LDVDLKNTILIVDGFTRFSAEEEALIKSLSSRCQEIIIGAYASQKAYKANFTNGNIYSAG 240 
L+ L T++++DGF+RFSAEE+ L+ L++ CQE+IIG+Y SQKAY+ +F GNIY A 
IiNNQLSKTVWIDGFSRFSAEEDYLLSLIJSmCQEVIIGSYVSQKAYQKSFIKGNIYEAS 240 



+ FL+ LA + KP F S 



K F +++ E HDF+ 



L + D ++W+ 



CINQKDEVEHVARAIRQKLYQGYRYKDILVLLGDVDSYKLQLSKIFEQYDIPYYFGKAET 360 
C +QK+E+EHVA++IRQKLY+GYRYKDILVLLGD+D+Y+LQ+ IF++++IPYY GKAE 
CHHQKEE IEHVAKS IRQKLYEGYRYKD ILVLLGDMDAYQLQIGPI FDKFE I PYYLGKAEP 360 

MAAHPLVHFMDSLSRI KRYRFRAED VLNLFKTGI YGE I SQDDLDYFEAYI S YADI KGPKK 420 
MAAHPLV F++SL R +RY +R ED+LN+ K+G++G D+D FE Y +ADIKG K 

MAAHPLVQFIESLERSQRYNWRREDILNMLKSGLFGCFDDSDIDRFEEYTQFADIKGFTK 420 

FFTDFW-GAKKFDLGRLOTIRQSLLTPLESFVKTKKQDGIKTLNQFMFFLTQVGLSDNL 479 
F F + ++++ L LN +RQ ++ PL+ K++KQ G +++ + FL ++ L++N+ 
FSKPFTINSSRQYPLDFLNEMRQDIVLPLQELFKSQKQLGASLVDKLILFLKKIRLAENM 480 

SRLVGQMSENEQEKHQEWKTFTDILEQFQTIFGQEKLNLDEFLSLLNSGMMQAEYRMVP 539 

L S+ E EK++EVWK FTDIL F IFGQEKL L + L+L+ +GM A+YR+VP 
QGLA--QSQLEVEKNEEVWKRFTDILTSFHHIFGQEKLRLSDCLALIKTGMKSAQYRWP 538 



AT+DWT+KSYDLV+PHS FVYA+G+TQSHFPK + L+SD ER IN+ + HF 
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Query: 600 DIMTQENLKKNHFAJUIiSLFNAAKQELVLTIPQLIjNKSEDQMSPYLVELRDIGVPFNHKGR 659 

DI + EN KKNH ALSLFNAA +ELVL++ ++NE+ D +SPYL EL + G+P KG+ 
Sbjct: 598 DIASAENSKKNHQTALSLFNAATKELVLSVSTVINETFDDLSPYLKELINFGLPLLDKGK 657 

5 

Query: 660 QSLKEEADNIGNYKALLSRWDLYRSAIDKEMTKEEQTFWSVAVRYLRRQLTSKGIEIPI 719 

L + +IGNYKALLS+++ + R + EM+ +++ FW+V +RYLR+QL + +E+P 
Sbjct: 658 NYLSYDNSDIGNYKALLSQIIAINRQDL-IEMSDQDKMFWTVVLRYLRKQLRKQQLELPT 716 

10 Query: 720 ITDSLDTVTVSSDWTRRFPEDDPLKLSSSALTTFYNNQYKYFLQYVLGLEEQDSIHPDM 779 

L T +S +V+ FP+ PLKLS++ALT FYNNQY YFL+YVL L + +SIHPD 
Sbjct: 717 SDYRLSTKPLSKEVIEVCFPKGIPLKLSATALTVFYNNQYNYFLKYVLNLNKTESIHPDS 776 

Query: 780 RHHGTYLHRVFEILMKNQGIESFEEKLNSAINKTNQEDVFKSLYSEDAESRYSLEILEDI 839 
15 R HG YLHRVFE LMK+ E F+ KL AI TNQE F+ +Y ++AE+ YSL ILEDI 

Sbjct: 777 RIHGQYLHRVFERLMKDHTQEPFDNKLKQAIYHTNQESFFQQVYQDNAEAEYSLAILEDI 836 

Query: 840 ARATATILRQDSQMTVESEEERFELMIDNTIKINGIIDRIDRLSDGSLGWDYKSSAQKF 899 
R+TA IL+ + + V +E+ F+L + N I ++GIIDRID+LSDGSLG+VDYKSSA +F 
20 Sbjct: 837 VRSTAPILQLNQNIQVIDQEKNFQLDMGNEILVHGIIDRIDQLSDGSLGIVDYKSSANQF 896 

Query: 900 DIQKFYNGLSPQLVTYIDAISR- -DKEVEQKPPIFGAMYLHMQEPRQDLSKIKNLDD-LV 956 

DI FYNGLSPQL+TY+ A+ + ++ Q +FGAMYLH+Q+P+ DL K +D+ LV 
Sbjct: 897 DIGTFYNGLSPQLMTYLAALKQIAPHDINQ LFGAMYLHLQDPKLDLVTFKQIDNTLV 953 



25 



Query: 957 TKNHQALTYKGLFSEAEKEFLANGKYHLKDSLYSETEIAILQAHNQSLYKKASETIKSGK 1016 

++ALTYKG+FSE EKE L+ G Y K++LYS E+ L +N+ LY KA++ IK G 
Sbjct: 954 ESIYKALTYKGIFSEVEKEHLSTGAYQTKNALYSNDELETLLNYNKYLYLKAAKHIKKGH 1013 



30 Query: 1017 FLINPYTEDAKTVDGDQFKSITGFEADRHMARARALYKLPAKEKRQGFLTLMQQE 1071 

FLINPYT D KTV GDQ K+IT FEAD M +AR L LPAKEK++ FLTLM++E 
Sbjct: 1014 FLINPYTSDGKTVQGDQLKAITRFEADLDMGQARRLVTLPAKEKKECFLTLMRKE 1068 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1400 

A DNA sequence (GBSxl485) was identified in S.agalactiae <SEQ ID 4297> which encodes the amino 
acid sequence <SEQ ID 4298>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
40 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -7.80 Transmembrane 51 - 67 ( 44 - 69) 

Final Results 

bacterial membrane Certainty=0. 4121 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8799> which encodes amino acid sequence <SEQ ID 8800> 
was also identified. Analysis of this protein sequence reveals the following: 

50 Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: -20.62 
GvH: Signal Score (-7.5): -6.25 

Possible site: 31 
>>> Seems to have no N-terminal signal sequence 
55 ALOM program count: 1 value: -7.80 threshold: 0.0 

INTEGRAL Likelihood = -7.80 Transmembrane 47 - 63 ( 40 - 65) 
PERIPHERAL Likelihood =3.34 26 
modified ALOM score: 2.06 



60 



*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0. 4121 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75528 GB:AE000334 orf, hypothetical protein [Escherichia coli K12] 
Identities = 138/297 (46%), Positives = 193/297 (64%), Gaps = 16/297 (5%) 

Query: 5 MKIDDLRKSDNVEDRRSSSGGSFSSGGSGLPILQLLLLRGSWKTKLWLIILLLLG--GG 62 

M+ R+SDNVEDRR+SSGG S GG G + S K L++LI++L+ G G 

Sbjct: 1 MRWQGRRESDNVEDRRNSSGGP - SMGGPGFRL PSGKGGLILLIWLVAGYYGV 52 

Query: 63 GLTSIFNDSSSPSSYQSQNVSRSVDNSATREQIDFVNKVLGSTEDFWSQEFQTQGFGNYK 122 

LT + ++++S + D +A F + +L +TED W Q+F+ G Y+ 

Sbjct: 53 DLTGLMTGQPVSQQQSTRS I S PNEDEAAK FTSVI LATTEDTWGQQFEKMG - KTYQ 106 

Query: 123 EPKLVLYTNSIQTGCGIGESASGPFYCSADKKIYLDISFYNELSHKYGATGDFAMAYVIA 182 

+PKLV+Y +TGCG G+S GPFYC AD +Y+D+SFY+++ K GA GDFA YVIA 
Sbjct: 107 QPKLVMYRGMTRTGCGAGQSIMGPFYCPADGTVYIDLSFYDDMKDKLGADGDFAQGYVIA 166 

Query: 183 HEVGHHIQTELGIMDKYNRMRHGLTKKEANALNVRLELQADYYAGVWAHYIRGKNLLEQG 242 

HEVGHH+Q LGI K +++ T+ E N L+VR+ELQAD +AGVW H ++ + +LE G 
Sbjct: 167 HEVGHHVQKLLGIEPKVRQLQQNATQAEVNRLSVRMELQADCFAGVWGHSMQQQGVLETG 226 

Query: 243 DFEEAMNARHAVGDDTLQKETYGKLVPDSFTHGTAEQRQRWFNKGFQYGDIQHGDTF 299 

D EEA+NAA A+GDD LQ+++ G+ +VPDS FTHGT+ + QR WF +GF GD +TF 
Sbjct: 227 DLEEALNAAQAIGDDRLQQQSQGRWPDSFTHGTSQQRYSWFKRGFDSGDPAQCNTF 283 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4299> which encodes the amino acid 

sequence <SEQ ID 4300>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -6.42 • Transmembrane 48 - 64 ( 41 - 67) 



Final Results 

bacterial membrane Certainty=0 . 3569 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC75528 GB:AE000334 orf, hypothetical protein [Escherichia coli] 
Identities = 143/301 (47%) , Positives = 195/301 (64%) , Gaps = 21/301 (6%) 

Query: 1 MKTDDLRESQQVEDRRGQSSG-SFGGGGLGGGLLLQLLFSRGGWKTKLVILLLLLVMG-- 57 

M+ RES VEDRR S G S GG G +L +GG L++L+++LV G 
Sbjct: 1 MRWQGRRESDNVEDRRNSSGGPSMGGPGF RLPSGKGG LILLIWLVAGYY 50 

Query: 58 GGGLSGVLGGKPSSTNNNAYQSSQVTRTNGDKASQEQVSFVSKVFASTEDYWTKTFREKG 117 

G L+G++ G+P S QS++ N D+A++ F S + A+TED W + F + G 
Sbjct: 51 GVDLTGLMTGQPVSQQ QSTRS I SPNEDEAAK FTSVI LATTEDTWGQQFEKMG 102 

Query: 118 LTYHKPTLVLYTGATQTACGRGQASSGPFYCPGDQKVYLDISFYNELSTKYGAKGDFAMA 177 

TY +P LV+Y G T+T CG GQ+ GPFYCP D VY+D+SFY+++ K GA GDFA 
Sbjct: 103 KTYQQPKLVMYRGMTRTGCGAGQSIMGPFYCPADGTVYIDLSFYDDMKDKLGADGDFAQG 162 

Query: 178 OTIAHEVGHHIQNELGI^NYASARQGKSKAKANQLNVKLELQADYYAGAWANYVQGQGL 237 

YVIAHEVGHH+Q LGI +Q ++A+ N+L+V++ELQAD +AG W + +Q QG+ 

Sbjct: 163 YVIAHEVGHHVQKLI/3IEPKVRQLQ^mTQAEVNRLSVR^lX3M3CFAGWGHSMQi3QGV 222 

Query: 238 LEKGDIEEAMARAHAVGDDTLQEETYGRTVPDSFTHGTSKQRQRWFDRGYQYGDFEHGDTF 298 

LE GD+EEA+ AA A+GDD LQ+++ GR VPDSFTHGTS+QR WF RG+ GD +TF 
Sbjct: 223 LETGDLEEALNAAQAIGDDRLQQQSQGRWPDSFTHGTSQQRYSWFKRGFDSGDPAQCNTF 283 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/303 (63%) , Positives = 241/303 (79%) , Gaps = 5/303 (1%) 



Query: 


5 


MKIDDLRKSDNVEDRRSSSGGSFSSGG-SGLPILQLLLLRGSWKTICLvArtillLLLLGGGG 


63 






Mis. UD-bK+b VfcjUKK o Cjfar (jb +JjU1iJ-i KLj Wi\.lJvLiV+ij++ijij++tjt3t3tj 




Sb j ct : 


1 


MKTDDLRESQQVEDRRGQSSGSFGGGGLGGGLLLQLLFSRGGWKTKLVILLLLLVMGGGG 


60 


Query: 


64 


LTSIFN- - -DSSSPSSYQSQOTSRSVDNSATREQIDFWSIKVLGSTEDFWSQEFQTQGFGN 


120 






L+ + S++ ++YQS V+R+ + A++EQ+ FV+KV STED+W++ F+ +G 




Sb j ct : 


61 


LSGVI,GGKPSSTNMNAYQSSQvTRTNGDKASQEQVSFVSKVFASTEDyWTKTFREKGL-T 


119 


Query: 


121 


YKEPKLVLYTNSIQTGCGIGESASGPFYCSADKKIYLDISFYNELSHKYGATGDFAMAYV 


180 






Y +P LVLYT + QT CG G+++SGPFYC D+K+YLDISFYNELS KYGA GDFAMAYV 




Sb j ct : 


120 


YHKPTLVLYTGATQTACGRGQASSGPFYCPGDQKVYLDISFYNELSTKYGAKGDFAMAYV 


179 


Query: 


181 


IAHEVGHHIQTELGIMDKYNRMRHGLTKKEANALITVRLELQADYYAGWAHYIRGKI^LE 


240 






IAHEVGHHIQ ELGIMD Y R G +K +AN IiNV+LELQADYYAG WA+Y++G+ LLE 




Sb j ct : 


180 


IAHEVGHHIQNELGIMDNYASARQGKSKAKANQLNVTCLELQADYYAGAWANYVQGQGLLE 


239 


Query: 


241 


QGDFEEAMNAAHAVGDDTLQKETYGKLVPDSFTHGTAEQRQRWFNKGFQYGDIQHGDTFS 


300 






+GD EEAM AAHAVGDDTLQ+ETYG+ VPDSFTHGT++QRQRWF++G+QYGD +HGDTFS 




Sb j ct : 


240 


KGDIEEAMAAAHAVGDDTLQEETYGRTVPDSFTHGTSKQRQRWFDRGYQYGDFEHGDTFS 


299 


Query: 


301 


VEH 303 




Sb j ct : 


300 


+ + 

IPY 302 





SEQ ID 8800 (GBS404) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 3; MW 62kDa). 

GBS404-GST was purified as shown in Figure 218, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1401 

A DNA sequence (GBSxl486) was identified in S.agalactiae <SEQ ID 4301> which encodes the amino 
acid sequence <SEQ ID 4302>. This protein is predicted to be phenylalanyl-tRNA synthetase beta chain 
(pheT). Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2617 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14823 GB:Z99118 phenylalanyl-tRNA synthetase (beta subunit) 
[Bacillus subtilis] 
Identities = 376/805 (46%) , Positives = 523/805 (64%) , Gaps = 6/805 (0%) 

Query: 1 MLVSYKWLKELVBVI3- VTTAELAEKMSTTGIEVEGVETPAEGIiSKLVVGHIVSCEDVPDT 59 

M VSYKWL++ VD+ + A IAEK++ GIEVEG+E EG+ +V+GH++ E P+ 
Sbjct: 1 MFVSYKWLEDYVDLKGMDPAVLAEKITRAGIEVEGIEYKGEGIKGWIGHVLEREQHPNA 60 

Query: 60 H-LHLCQVDTGDDELRQWCGAPNVXTGINVIVAVPGARIADNYKIKKGKIRGMESLGMI 118 

L+ C VD G + Q++CGAPNV G V VA GA + N+KIKK K+RG ES GMT 
Sbjct: 61 DKLNKCLVDIGAFAPVQIICGAPNVDKGQKVAVATVGAvljPGNFKIKKAKLRGEESNGMI 120 
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Query: 119 CSLQELGLSESIIPKEPSDGIQILPEGAIPGDSIFSYLDLDDEIIELSITPNRADALSMR 178 

CSLQELG+ ++ KE+++GI + P A G + L LDD I+EL +TPNRADA++M 
Sbjct: 121 CSLQELGIESKLVAKEYAEGICTFPNDAETGSDALAALQLDDAILELGLT^ 180 

5 Query: 179 GVAHEVAAIYGKKVHFEEKNLIEEAERAADKISWIESDKVLS-YSARIVKNVTVAPSPQ 237 

GVA+EVAAI +V + + +E+A+D ISV IE + Y+A+I+KNVT+APSP 
Sbjct: 181 GVAYEVAAILDTEVKLPQTDYPAASEQASDYISVKIEDQEANPLYTAKIIKNVTIAPSPL 240 

Query: 238 WLQNKLMNAGIRPIlJ)WVDVTNYVLI.Ty 297 
10 W+Q KLMNAGIRP NNWD+TN+VLL YGQP+HAFD+D+F +V R A E ++TLD 

Sbjct: 241 WMQTKL^AGIRPHNNWDITNFVLLEY 300 

Query: 298 GEERDLIADDLVIAVM3QPVALAGVMGGQSTEIGSSSKTVVLEAAVEWGTSIRKTSGRLN 357 
+ER L AD LVI + A+AGVMGG +E+ +KT++LEAA FNG +RK S h 
15 Sbjct: 301 DQERKLSADHLVITNGTKAQAVAGVMGGAESEVQEDTKTILLEAAYFMGQKVRKASKDLG 360 

Query: 358 LRSESSSRFEKGINYDTVSEAMDFAAAMLQELAGGQVLSGQVTEGVLPTEPVEVSTTLGY 417 

LRSESS RFEKG1+ V A + AA ++ AGG+VL+G V E L E + + 
Sbjct: 361 LRSESSVRFEKGIDPARVRLAAERAAQL1HLYAGGEVLAGTVEEDHLTIEANNIHVSADK 420 

20 

Query: 418 VWTRLGTELTYTDIEEVFEKLGFAISGSEVKFTVLVPRRRWDIAIQADLVEEIARIYGYE 477 

V++ LG ++ ++ ++++LGF + ++ V VP RR DI 1+ DL+EE AR+YGY+ 
Sbjct: 421 VSSVLGLTISKEELISIYKRLGFIVGEADDLLWTVPSRRGDITIEEDLIEEAARLYGYD 480 

25 Query: 478 KLPTTLPEAGATAGELTSMQRLRRRVRTVAEGAGLSEIITYALTTPEKAVQFSTQATNIT 537 

+P+TLPE TGLT Q RR+VR EGAGLS+ ITY+LT +KA F+ + + T 
Sbjct: 481 NIPSTLPETAGTTGGLTPYQAKRRKVRRFLECSAGLSQAITYSLMffiKKATAFAIEKSLNT 540 

Query: 538 ELtOTPMTVDRSALRQNWSGMLDTIAY2^VARKNSNLAVYEIGKVFEQTGNPKEDLPTEVE 597 
30 L PM+ +RS LR ++V +LD+++YN+AR+ ++A+YE+G VF +4- PEE 

Sbjct: 541 VLftLPMSEERSILRHSLVPOTjLDSVSYOTJU?QTDSV!ALYEVGSW--LTKEEDTKPVETE 598 

Query: 598 TFTFALTGLVEEKDFQTKSKPVDFFYAKGIVEALFIKLK-LDVTFVaQKGLASMHPGRTA 655 
A+TGL ++ +Q + KPVDFF KGIVE L KL LD Q +HPGRTA 

35 Sbjct: 599 RVAG&VTGLWRKQLWQGEKKPVDFFVVKGIVE^^ 658 

Query: 657 TILLDGKEIGFVGQVHPQTAKQYDIPETYVAEINLSTIESQMNQALIFEDITKYPSVSRD 716 

ILL+G IGF+GQVHP K+ DI ETYV E++L + + L++ I KYPSV+RD 

Sbjct: 659 NILLNGSLIGFIGQVHPSLEKELDIKETYVFELDLHALLAAETAPLVYTAIPKYPSVTRD 718 

40 

Query: 717 IALLLAESVSHHDIVSAIETSGVKRLTAIKIiFDVYAGNNIAEGYKSMAYSLTFQNPNDNL 776 

IAL+ ++V+ + S 1+ +G K L + +FDVY G ++ EG KS+A+SL + NP L 
Sbjct: 719 lALVTDKTVTSGQLESVIKEAGGKIiLKEVWFDVYEGEHMEEGKKSVAFSLQYVNPEQTL 778 

45 Query: 777 TDEEVAKYMEKITKSLVEKVNAEIR 801 

T+EEV K K+ K+L + A +R 
Sbjct: 779 TEEEVTKAHSKVLKALEDTYQAVLR 803 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4303> which encodes the amino acid 
50 sequence <SEQ ID 4304>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 1283 (Affirmative) < succ> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

60 Identities = 595/801 (74%) , Positives = 687/801 (85%) 

Query: 1 MLVSYKWLKELVDVDVTTAELAEKMSTTGIEVEGVETPAEGLSKLVVGHIVSCEDVPDTH 60 

MLVSYKWLKELVD+DVT A LAEKMSTTGIEVEG+E PA+GLSKLVVGH++SCEDVP+TH 
Sbjct: 6 MLVSYKWLKELVDIDVTPAALAEKMSTTGIEVEGIEVPADGLSKLVVGHVLSCEDVPETH 65 

65 
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Query: 61 LHLCQVDTGDDELRQWCGAPNWTGIWIVAVPGARIADNYKIKKGKIRGMESLGMICS 120 

LHLCQVDTGD+ RQ+VCGAPNVK GI VIVAVPGARIADNYKI KKGKI RGMESLGMI CS 
Sbjct: 66 LHLCQ VDTGDETPRQI VCGAPNVKAG I KVIVAVPGARIADNYKI KKGKI RGMESLGMI CS 125 

5 Query: 121 LQELGLSESIIPKEFSDGIQILPEGAIPGDSIFSYLDLDDEIIELSITPNRADALSMRGV 180 

LQELGLS+SIIPKEFSDGIQILPE A+PGD+IF YLDLDD I IELS ITPNRADALSMRGV 
Sbjct: 126 LQELGLSDSIIPKEFSDGIQILPEEAVPGDAIFKYLDLDDHIIELSITPNRADALSMRGV 185 

Query: 181 AHEVAAIYGKKVHFEEKl^IEEAERAADKISWIESDKVLSYSARIVKNVTVAPSPQWLQ 240 
10 AHEVAAIYGK V F +KNL E + ++ I V I SD VL+Y++R+V+NV V PSPQWLQ 

Sbjct: 186 AHEVAAIYGKSVSFPQKNLQESDKATSEAIEVAIASDNVLTYASRWENVKVKPSPQWLQ 245 

Query: 241 NKLMNAGIRPIMSTVVDVTNYVLLTYGQPMHAFDFDKFDGTTIVARNAENGEKLITLDGEE 300 
N LMNAGIRPINNVVDVTNYVLL +GQPMHAFD+DKF+ IVAR A GE L+TLDGE+ 
15 Sbjct: 246 NLLMNAGIRPINNVVDVTNYVLLYFGQPMHAFDYDKFEDHKIVARAARQGESLVTLDGEK 305 

Query: 301 RDLIADDLVIAVMJQPVALAGVMGGQSTEIGSSSKTVVLEAAVFNGTSIRKTSGRLNLRS 360 

RDL +DLVI V D+PVALAGVMGGQ+TEI ++S+TWLEAAVF+G S I RKTSGRLNLRS 
Sbjct: 306 RDLTTEDLVITVADKPVALAGVMGGQATEIDANSQTVVLEAAVFDGKSIRKTSGRLNLRS 365 

20 

Query: 361 ESSSRFEKGINYDTVSEAMDFARAMLQELAGGQVLSGQVTEGVLPTEPVEVSTTLGYVNT 420 

ESSSRFEKG+NY TV EA+DFAAAMLQELA GQVLSG V G LPTEPVEVST+L YVN 
Sbjct: 366 ESSSRFEKGVNYATVLEALDFAAAMLQELAEGQVLSGHVQAGQLPTEPVEVSTSLDYVNV 425 

25 Query: 421 RLGTELTYTDIEEVFEKLGFAISGSEVKFTVLVPRRRWDIAIQADLVEEIARIYGYEKLP 480 

RLGTELT+ DI+ +F++LGF ++G E FTV VPRRRWD++I ADLVEEIARIYGY+KLP 
Sbjct: 426 RLGTELTFADI QRI FDQLGFGLTGDETS FTVAVPRRRWDVS I PADLVEE I ARI YGYDKLP 485 

Query: 481 TTLPEAGATAGELTSMQRLRRRVRTVAEGAGLSEIITYALTTPEKAVQFSTQATNITELM 540 
30 TTLPEAG TA ELT Q LRR+VR +AEG GL+EII+YALTTPEKAV+F+ +++TELM 

Sbjct: 486 TTLPEAGGTAAELTPTQALRRKVRGLAEGLGLTEIISYALTTPEKAVEFAVAPSHLTELM 545 

Query: -541 WPMTVDRSALRQNWSGMLDTIAYNVARKNSNIAVYEIGKVFEQTGNPKEDLPTEVETFT 600 
WPM+V+RSALRQN+VSGMLDT+AYNVARK SNLA+YEIGK+FEQ NPKEDLP EV F 
35 Sbjct: 546 WPMSVERSALRQNMVSGMLDWAYNVARKQSNLALYEIGKIFEQEANPKEDLENEVNHFA 605 

Query: 601 FALTGLVEEKDFQTKSKPVDFFYAKGIVEALFIKLKLDVTFVAQKGLASMHPGRTATILL 660 

FA+ GLV +KDFQT+++ VDF++AKG ++ LF L L V +V K LA+MHPGRTA ILL 
Sbjct: 606 FAICGLVAQKDFQTQAQAVDFYHAKGNLDTLFANLNLKVQYVPTKDIANMHPGRTALILL 665 

40 

Query: 661 DGKEIGFVGQVHPQTAKQYDIPETYVAEINLSTIESQMNQALIFEDITKYPSVSRDIALL 720 

D + IGFVGQVHP TAK Y IPETYVAE++++ +E+ + F +ITK+P+++RD+ALL 

Sbjct: 666 DEQVIGFVGQVHPGTAKAYSIPETYVAELDMAALEAALPSDQTFAEITKFPAMTRDVALL 725 

45 Query: 721 LAESVSHHDIVSAIETSGVKRLTAIKLFDVYAGNNIAEGYKSMAYSLTFQNPNDNLTDEE 780 

L VSH IV+AIE++GVKRLT+IKLFDVY GIG KSMAYSLTFQNPNDNLTDEE 
Sbjct: 726 LDREVSHQAIVTAIESAGVKRLTSIKLFDVYEGATIQAGKKSMAYSLTFQNPNDNLTDEE 785 

Query: 781 VAKYMEKITKSLVEKVNAEIR 801 
50 VAKYMEKITK+L E+V AE+R 

Sbjct: 786 VAKYMEKITKALTEQVGAEVR 806 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1402 

A DNA sequence (GBSxl487) was identified in S.agalactiae <SEQ ID 4305> which encodes the amino 
acid sequence <SEQ ID 4306>. Analysis of this protein sequence reveals the following: 



60 



Possible site: 43 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0653 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 {Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9769> which encodes amino acid sequence <SEQ ID 9770> 
was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15205 GB:Z99120 transcriptional regulator [Bacillus subtilis] 
Identities = 60/169 (35%) , Positives = 100/169 (58%) 

Query: 17 ITFKKVGLDNVNILQNIAIETFRQTFSHDNSEEQLQAFENESYTLPVLKSEITHAESDTY 76 
10 + KK +++ LQ ++IETF TF NS E ++A+ ++ L+ E+++ S + 

Sbjct: 3 VKMKKCSREDLQTLQQLSIETFNDTFKEQNSPENMKAYLESAFNTEQLEKELSNMSSQFF 62 

Query: 77 FVYLDTDLVGYLKVNWGSQQTEKDLDKAFEIQRIYLLDAYQGQGIGKATFEFALDLAYKS 136 
F+Y D ++ GY+KVN Q+E+ ++ EI+RIY+ +++Q G+GK A+++A + 

15 Sbjct: 63 FIYFDHEIAGYVKVNIDDAQSEEMGAESLEIERIYIKNSFQKHGLGKHLLNKAIEIALER 122 

Query: 137 GLDWAWLGVWEFNHKAQAFYAKYGFEKFSEHQFSVGDKVDTDWLLRKSL 185 

WLGVWE N A AFY K GF + H F +GD+ TD ++ K+L 
Sbjct: 123 NKKNIWLGVWEKNENAIAFYKKMGFVQTGAHSFYMGDEEQTDLIMAKTL 171 

20 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1403 

25 A DNA sequence (GBSxl488) was identified in S.agalactiae <SEQ ID 4307> which encodes the amino 
acid sequence <SEQ ID 4308>. This protein is predicted to be phenylalanyl-tRNA synthetase (alpha 
subunit) (pheS). Analysis of this protein sequence reveals the following: 



30 



35 



Possible site: 45 
, >» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3937 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9339> which encodes amino acid sequence <SEQ ID 9340> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14824 GB:Z99118 phenylalanyl-tRNA synthetase (alpha subunit) 
40 [Bacillus subtilis] 

Identities = 209/338 (61%) , Positives = 270/338 (79%) , Gaps = 2/338 (0%) 

Query: 1 MKISTQEKLKEM-TGNHTKELQDLRVQVLGKKGSLTELLKGLKDLSNDLRPWGKQVNEV 59 
+K QE L+++ + K + D+RVQ LGKKG +TE+L+G+ LS + RP +G NEV 
45 Sbjct: 5 LKQLEQEALEQVEAASSLKVVNDIRVQYLGKKGPITEVIiRGMGKLSAEERPKMGALANEV 64 

Query: 60 RDILTKAFEEQAKVVEAAKIQAQLESESVDVTLPGRQMTLGHRHVLTQTSEEIEDIFLGM 119 

R+ + A ++ + +E +++ +L +++DVTLPG + +G RH LT EEIED+F+GM 
Sbjct: 65 RERIANAIADKNEKLEEEEMKQKIAGQTIDVTLPGNPVAVGGRHPLTWIEEIEDLFIGM 124 

50 

Query: 120 GFQvVDGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQARTMDQHDF 179 

G+ V +G EVE DYYNFE +NLPK+HPARDMQD+FYITEE L+RT TSPVQ RTM++H+ 
Sbjct: 125 GYTVEEGPEVETDYYNFESLNLPKEHPARDMQDSFYITEETLMRTQTSPVQTRTMEKHE- 183 

55 Query: 180 SKGPLKMISPGRVFRRDTDDATHSHQFHQIEGLWGENISMGDLKGTLQLISQKMFGAER 239 
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KGP+K+I PG+V+RRD DDATHSHQF QIEGLW +NISM DLKGTL+L+++KMFG +R 
Sbjct: 184 GKGPVKIICPGKVYRRDNDDATHSHQFMQIEGLWDKNISMSDLKGTLELVAKKMFGQDR 243 

Query: 240 KIRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCKQTGWIEILGAGMVHPSVLEMSGIDSE 299 
5 +IRLRPS+FPFTEPSVEVDV+CFKCGG GC+VCK TGWIEILGAGMVHP+VL+M+G D + 

Sbjct: 244 EIRLRPSFFPFTEPSVEVDVTCFKCGGNGCSVCRGTGWIEILGAGMVHPNVLKMAGFDPK 303 

Query: 300 KYSGFAFGLGQERIAMLRYGINDIRGFYCGDVRFTDQF 337 
+Y GFAFG+G ERIAML+YGI +DIR FY DVRF QF 
10 Sbjct: 304 EYQGFAFGMGVERIAMLKYGIDDIRHFYTNDVRFISQF 341 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4309> which encodes the amino acid 
sequence <SEQ ID 4310>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2806 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/337 (90%) , Positives = 327/337 (96%) 

25 Query: 1 MKISTQEKLKEMTGNHTKELQDLRVQVLGKKGSLTELLKGLKDLSNDLRPWGKQVNEVR 60 

+K T E L+ +TGNHTKELQDLRV VLGKKGSLTELLKGLKDLSNDLRPWGKQVNEVR 
Sbjct: 36 LKTKTLETLQSLTGNHTKELQDLRVAVLGKKGSLTELLKGLKDLSNDLRPWGKQVNEVR 95 

Query: 61 DILTKAFEEQAKVvFAAKIQAQLESESVDVTLPGRQMTLGHRHVLTQTSEEIEDIFLGMG 120 
30 D+LTKAFEEQ7AK+VEAAKIQAQL++ES+DVTLPGRQMTLGHRHVLTQTSEEIEDIFLGMG 

Sbjct: 96 DLLTKAFEEQAKIvEAAKIQAQIJ>AESIDvTLPGRQ^WIIGHRHVLTQTSEEIEDIFLGMG 155 

Query: 121 FQVvTJGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQARTMDQHDFS 180 
FQ+VDGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQART+DQHDFS 
35 Sbjct: 156 FQIVDGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQARTLDQHDFS 215 

Query: 181 KGPLKMISPGRVFRRDTDDATHSHQFHQIEGLWGENISMGDLKGTLQLISQKMFGAERK 240 

KGPLKM+S PGRVFRRDTDDATHSHQFHQ IEGL VVG+NI SMGDLKGTL+ + 1 +KMFG ER 
Sbjct: 216 KGPLKMVSPGRVFRRDTDDATHSHQFHQIEGLWGKNISMGDLKGTLEMIIKKMFGDERS 275 

40 

Query: 241 IRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCKQTGWIEILGAGMVHPSVLEMSGIDSEK 3 00 

IRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCK+TGWIEILGAGMVHPSVLEMSG+D+++ 
Sbjct: 276 IRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCKKTGWIEILGAGMVHPSVLEMSGVDAKE 335 

45 Query: 301 YSGFAFGLGQERIAMLRYGINDIRGFYQGDVRFTDQF 337 

YSGFAFGLGQERIAMLRYGINDIRGFYQGD RF++QF 
Sbjct: 336 YSGFAFGLGQERIAMLRYGINDIRGFYQGDQRFSEQF 372 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1404 

A DNA sequence (GBSxl489) was identified in S.agalactiae <SEQ ID 431 1> which encodes the amino 
acid sequence <SEQ ID 4312>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 834 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1405 

A DNA sequence (GBSxl490) was identified in S.agalactiae <SEQ ID 4313> which encodes the amino 
acid sequence <SEQ ID 4314>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2762 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1406 

A DNA sequence (GBSxl491) was identified in S.agalactiae <SEQ ID 4315> which encodes the amino 
acid sequence <SEQ ID 4316>. This protein is predicted to be DNA-entry nuclease. Analysis of this protein 
sequence reveals the following: 

25 Possible site: 13 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8801> which encodes amino acid sequence <SEQ ID 8802> 
was also identified. Analysis of this protein sequence reveals the following: 

35 Lipop Possible site: -1 Crend: 5 

McG: Discrim Score: 10.13 
GvH: Signal Score (-7.5): -5.07 

Possible site: 23 
>>> Seems to have an uncleavable N-term signal seq 
40 ALOM program count: 1 value: -6.79 threshold: 0.0 

INTEGRAL Likelihood = -6.79 Transmembrane 8 - 24 ( 6 - 27) ' 
PERIPHERAL Likelihood = 6.26 258 
modified ALOM score: 1.86 

45 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3718 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 
Identities = 154/232 (56%), Positives = 180/232 (77%), Gaps = 1/232 (0%) 

5 Query: 41 KlWSGTPSRELSES^TSlWKKQLGTNIAVmQSGAFIINQNKTDIJJAKVSSAPyAINEIK 100 

K S PS+ L+ESVLT VK Q+ ++ WN SGAFI+N NKT+L+AKVSS PYA N+ K 
Sbjct: 43 KQASEAPSQAIMSVLTDAVKSQIKGSLEWNGSC3AFIVNGNKTNLDAKVSSKPYADNKTK 102 

Query: 101 KVNNQIVPTKANALLTKATRQYRNREETG^RTYWKPAGWHQINGLKGSYNHAVDRGHLI 160 
10 V + VPT ANALL+KATRQY+NR+ETGNG T W P GWHQ+ LKGSY HAVDRGHL+ 

Sbjct: 103 WGKETVPWANALLSKATRQYKNRKETGNGSTSM'PPGWHQVKNLKGSYTHAVDRGHLIi 162 

Query: 161 GYALVGSLRGFDASTSNPKNIATQAAWANQANSNQSTGQKTYYETLVRKALDRHKTVRYRV 220 
GYAL+G L GFDASTSNPKNIA Q AWANQA + STGQNYYE+ VRKALD++K VRYRV 
15 Sbjct: 163 GYALIGGnDGFDASTSNPKNIAVQTAWANQAQAEYSTGQWYYESKVRKALDQNKRVRYRV 222 

Query: 221 TLIY-DRDNbLSSGSHIEAKSSDGSLEFNVFIPNVQSGLLFDYATGKVKQTK 271 

TL Y ++L+ S S IEAKSSDG LEFNV +PNVQ GL DY TG+V T+ 
Sbjct: 223 TLYYASNEDLVPSASQIEAKSSDGELEFNVLVPNVQKGLQLDYRTGEVTVTQ 274 

20 

There is also homology to SEQ IDs 368 and 1302. 

SEQ ID 8802 (GBS285) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 56 (lane 6; MW 32kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 7; MW 57.5kDa). 

25 GBS285-GST was purified as shown in Figure 208 (lane 7) and Figure 225 (lane 8). 

GBS658 was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 134 (lane 8 & 9; MW 27kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 1407 

A DNA sequence (GBSxl492) was identified in S.agalactiae <SEQ ID 4317> which encodes the amino 
acid sequence <SEQ ID 4318>. Analysis of this protein sequence reveals the following: 



35 



40 



Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 1408 

A DNA sequence (GBSxl493) was identified in S.agalactiae <SEQ ID 4319> which encodes the amino 
acid sequence <SEQ ID 4320>. This protein is predicted to be UDP-N-acetylglucosamine (murA). Analysis 
of this protein sequence reveals the following: 
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Possible site: 43 

>>> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0 . 1814 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9767> which encodes amino acid sequence <SEQ ID 9768> 
10 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15693 GB:Z99122 UDP-N-acetylglucosamine 

1-carboxyvinyltransferase [Bacillus subtilis] 
Identities = 248/423 (58%) , Positives = 323/423 (75%) , Gaps = 5/423 (1%) 

15 

Query: 5 MDKIIVEGGQTQLQGQWIEGAmAVLPLLAATILPSQGKTLLTNVPILSDVFTMISINVVR 64 

M+KIIV GGQ +L G V +EGAKNAVLP++AA++L S+ K+++ +VP LSDV+T+N V+R 
Sbjct: 1 MEKI I WGGQ- KLNGTVTCVEGAKNAVLPVIAASLLASEEKS VI CDVPTLSD VYTINEVLR 59 

20 Query: 65 GLDIQVDFnCDKKEILVDASGDILDVAPYEFVSQMRASIWLGPILARNGHAKVSMPGGC 124 

L V F + E+ V+AS + AP+E+V +MRAS++V+GP+LAR GHA+V++PGGC 
Sbjct: 60 HLGADVHF--ENNEVTVNASYALQTEAPFEYvRKMRASVLVMGPLLARTGHARVALPGGC 117 

Query: 125 TIGSRPIDLHLKGLEAMGATITQNGGDITAQAE-KLKGANIYMDFPSVGATQNLMMAATL 183 
25 IGSRPID HLKG EAMGA I G I A+ + +L+GA IY+DFPSVGAT+NL+MAA L 

Sbjct: 118 A1GSRPIDQHLKGFEAMGAEIKVGNGFIEAEVKGRLQGAKIYLDFPSVGATENLIMAAAL 177 

Query: 184 ASGTOTIENAAREPEIVDIAQLIjNKMGAKVKGAGT 243 
A GTTT+EN A+EPEIVDLA +N MG K++GAGT T+ I GV+ LHG +H ++ DRIEA 
30 Sbjct: 178 AEGTTTLENVAKEPEIVDLANYINGMGGKIRGAGTGTIKIEGVEKLHGVKHHIIPDRIEA 237 

Query: 244 GTFMVAABOTSGNvLWDAIWEHmPLISKLMEMGWVSEEEDGIRVKADTKKLKPVWK 303 

GTFMVAAA+T GNVLVK A+ EH LI+K+ EMGV + +E +G+RV K+LKP+ +K 
Sbjct: 238 GTFMVAAAITEGNVLVKGAVPEHLTSLIAKMEEMGVTIKDEGEGLRV-IGPKELKPIDIK 296 

35 

Query: 304 TLPHPGFPTDMQAQFTALMAWNGESTMIETVFENRFQHLEEMRRMGLQTEILRDTAMIH 363 

T+PHPGFPTDMQ+Q AL+ +G S + ETVFENRF H EE RRM +1 + +1 + 
Sbjct: 297 TMPHPGFPTDMQSQMMALLLRASGTSMITETVFENRFMHAEEFRRMNGDIKIEGRSVIIN 356 

40 Query: ,364 GGRALQGAPvMSTDLRASAALILAGMVAGGQTWGQLTHLDRGYYQFHEKLAALGANIKR 423 

G LQGA V +TDLRA AALILAG+VA+G T V +L HLDRGY FH+KLAALGA+I+R 
Sbjct: 357 GPVQLQGAEVAATDLRAGAALIIAGLVAEGHTROTELKHLDRGYVDFHQKLAALGADIER 416 

Query: 424 VSE 426 
45 V++ 

Sbjct: 417 VND 419 

A related DNA sequence was identified in S.pyogenes <SEQ ID 432 1> which encodes the amino acid 
sequence <SEQ ID 4322>. Analysis of this protein sequence reveals the following: 

50 Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 377 - 393 ( 376 - 394) 

Final Results 

55 bacterial membrane Certainty=0 ,2211 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

60 >GP:CAB15693 GB:Z99122 UDP-N-acetylglucosamine 

1-carboxyvinyltransferase [Bacillus subtilis] 
Identities = 248/423 (58%) , Positives = 318/423 (74%) , Gaps = 5/423 (1%) 
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Query: 1 VDKIIIEGGQTRLEGEWIEGAKNAVLPLLAASILPSKGKTILKNVPILSDVFTMNNWR 60 

++KII+ GGQ +L G V +EGAKNAVLP++AAS+L S+ K+++ +VP LSDV+T+N V+R 
Sbjct: 1 MEKI ITOGGQ- KLNGWKVEGAKNAVLPVIAASLIiASEEKSVICDVPTLSDVYTINEVLR 59 

5 

Query: 61 GLDIRVDFNEAANEITVDASGHILDEAPYEOTSQMRASIVVLGPILARNGHAKVSMPGGC 120 

L V F NE+TV+AS + EAP+EYV +MRAS++V+GP+LAR GHA+V++PGGC 
Sbjct: 60 HLGADVHFEN- -NEVTVNASYALQTEAPFEYVRKMRASVLVMGPLLARTGHARVALPGGC 117 

10 Query: 121 TIGSRPINLHLKGLEAMGATITQKGGDITAQAD-RLQGAMIYMDFPSVGATQNLMMAATL 179 

IGSRPI+ HLKG EAMGA I G I A+ RLQGA IY+DFPSVGAT+NL+MAA L 
Sbjct: 118 AIGSRPIDQHLKGFEAMGAEIK^GNGFIEAEVKGRLQGAKIYLDFPSVGATENLIMAAAL 177 

Query: 180 ADGVTTIENAAREPEIVDIAQFLNKMGARIRGAGTETLTITGVTHLRGVEHDWQDRIEA 239 
15 A+G TT+EN A+EPEIVDLA ++N MG +IRGAGT T+ I GV L GV+H ++ DRIEA 

Sbjct: 178 AEGTTTLENVAKEPEIVDLANYINGMGGKIRGAGTGTIKIEGVEKLHGVKHHIIPDRIEA 237 

Query: 240 GTFMVAAAMTSGNVLIRDAVWEH^PLISKLMEMGVSVTEEEYGIRVQANTPKLKPVTVK 299 
GTFMVAAA+T GNVL++ AV EH LI+K+ EMGV++ +E G+RV +LKP+ +K 

20 Sbjct: 238 GTFMVAAAITEGNVLVKGAVPEHLTSLIAKMEEMGVTIKDEGEGLRV-IGPKELKPIDIK 296 

Query: 300 TLPHPGFPTDMQAQFTALMAWNGESTMVETVFENRFQHLEEMRRMGLQSEILRETAMIH 359 

T+PHPGFPTDMQ+Q AL+ +G S + ETVFENRF H EE RRM +1 + +1+ 
Sbjct: 297 TMPHPGFPTDMQSQMMALLLRASGTSMITETVFENRFMHAEEFRRMNGDIKIEGRSVIIN 356 

25 

Query: 360 GGRQLQGAPVMSTDLRASAALILTGIVAQGVTIVM^VHLDRGYYQFHEKLAKLGATISR 419 

G QLQGA V +TDLRA AALIL G+VA+G TV L HLDRGY FH+KLA LGA I R 
Sbjct: 357 GPVQLQGAEVAATDLRAGAALILAGLVAEGHTRVTELKHLDRGYVDFHQKLAALGADIER 416 

30 Query: 420 SSE 422 

++ 

Sbjct: 417 VND 419 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 363/422 (86%) , Positives = 391/422 (92%) 

Query: 5 ^KIIVEGGQTQLQGQWIEGAKNAVLPLLAATILPSQGKTLLTOTPILSDVFTMNNVVR 64 

+DKI I +EGGQT+L+G+ WIEGAKNAVLPLLAA+ ILPS +GKT+L NVPILSDVFTMHNWR 
Sbjct: 1 VDKI I IEGGQTRLEGE WI EGAKNAVLPLLAAS I LPSKGKT I LRNVPI LSDVFTMNNWR 60 

40 

Query: 65 GLDIQVDFNCDKKEILVDASGDILDVAPYEFVSQMRASIVVLGPILARNGHAKVSMPGGC 124 

GLDI+VDFN EI VDASG ILD APYE+ VSQMRAS I WLGP I LARNGHAKVSMPGGC 
Sbjct: 61 GLDIRVDFNEAANEITVDASGHILDFAPYEWSQMRASIVVLGPILARNGHAKVSMPGGC 120 

45 Query: 125 TIGSRPIDLHLKGLFAMGATITQNGGDITAQAEKLKGANIYMDFPSVGATQNLMMAATLA 184 

TIGSRPI+LHLKGLEAMGATITQ GGDITAQA++L+GA IYMDFPSVGATQNLMMAATLA 
Sbjct: 121 TIGSRPINLHLKGLEAMGATITQKGGDITAQADRLQGAMIYMDFPSVGATQNLMMAATLA 180 

Query: 185 SGTTTIENAAREPEIVDLAQLLNKMGAKVKGAGTETLTIIGVDALHGTEHDWQDRIEAG 244 
50 G TTIENAAREPEIVDIiAQ IiNKMGA+++GAGTETLTI GV L G EHDWQDRIEAG 

Sbjct: 181 DGVTTIENAAREPEIVDLAQFLNKMGARIRGAGTETLTITGVTHLRGVEHDWQDRIEAG 240 

Query: 245 TFMVAAAMTSGNVLVKDAIWEHIffiPLISKLMEMGvEVSEEEDGIRVKADTK^ 304 
TFMVAAAMTSGNVL+ +DA+WEHHRPL I SKLMEMGV V+EEE GIRV+A+T KLKPVTVKT 
55 Sbjct: 241 TFMVAAAMTSGNVLIRDAVWEHHRPLISKLMEMGVSVTEEEYGIRVQANTPKLKPVTVKT 300 

Query: 305 LPHPGFPTDMQAQFTALMAWNGESTMIETVFENRFQHLEEMRRMGLQTEILRDTAMIHG 364 

LPHPGFPTDMQAQFTALMAVVNGESTM+ETVFENRFQHLEEMRRMGLQ+EILR+TAMIHG 
Sbjct: 301 LPHPGFPTDMQAQFTALMAVWGESTMVETVFENRFQHLEEMRRMGLQSEILRETAMIHG 360 

60 

Query: 365 GRALQGAPVMSTDLRASAALILAGMVAQ^TWGQLTHLDRGYYQFHEKIiAALGANIKRVSE 426 

GR LQGAPVMSTDLRASAALIL G+VAQG T+V L HLDRGYYQFHEKLA LGA I RSSE 
Sbjct: 361 GRQLQGAPVMSTDLRASAALILTGIVAQJ3VTIVNNLWLDRGYYQFHEKLAKLGATISRSSE 422 

65 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1409 

A DNA sequence (GBSxl494) was identified in S.agalactiae <SEQ ID 4323> which encodes the amino 
acid sequence <SEQ ID 4324>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2096 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23756 GB:AB009314 proton- translocating ATPase, epsiron 
subunit [Streptococcus bovis] 
15 Identities = 102/138 (73%) , Positives = 121/138 (86%) , Gaps = 1/138 (0%) 

Query: 1 MAQLTVQVVTPDGIRYDHHASLITVRTPDGEMGILPGHINLIAPLNVHQMKINRSHQEG- 59 

M +TVQWTPDGIRYDHHA+ I+V+TPDGEMGILP HINLIAPL VH+MKI+R+ 
Sbjct: 1 MTFMTVQVVTPDGIRYDHHANFISVKTPDGEMGIIiPEHINLIAPLTVHEMKIHRTDDPNH 60 

20 

Query: 60 VDWAWGGIlEvNEDQOTIVADSAERARDIDMRAERAKERAERALEKAQTTQNIDEMR 119 

VDWVA+NGGI IE+ ++ VTIVADSAER RDID++RAERAK RAER LE+AQ+T +IDE+R 
Sbjct: 61 VDWVAINGGIIEIKDNLVTIVADSAERERDIDVSRAERAKIRAERKLEQAQSTHDIDEVR 120 

25 Query: 120 RAEVALRRAINRISVGKK 137 

RA+VALRRA+NRI SVG K 
Sbjct: 121 RAQVALRRALNRISVGNK 138 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4325> which encodes the amino acid 
30 sequence <SEQ ID 4326>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence. 

Final Results 

35 bacterial cytoplasm Certainty=0 .2539 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 100/138 (72%), Positives = 119/138 (85%), Gaps = 1/138 (0%) 

Query: 1 MAQLTVQVVTPDGIRYDHRASLITVRTPDGEMGILPGHINLIAPLNVHQMKINRSHQ-EG 59 

M Q+TVQWTPDGI+YDHHA I+V TPDGEMGILP HINLIAPL VH+MKI R + E 
Sbjct: 1 MTQMTVQWTPDGIKYDHHAKFISVTTPDGEMGILPNHINLIAPLQVHEMKIRRGGEDEK 60 

45 

Query: 60 VDWAWGGIIEVNEDQOTIVADSAERARDIDIjNRAERAKERAERALEKAQTTQNIDEMR 119 

VDW+A+NGGI IE+ ++ VTIVADSAER RDID++RAERAK RAER + +A+TT NIDE+R 
Sbjct: 61 VDWI AINGGI I E I KDNWTI VADSAERDRDIDVSRAERAKLRAEREI AQAETTHNIDEVR 120 

50 Query: 120 RAEVALRRAINRISVGKK 137 

RA+VALRRA+NRI +V KK 
Sbjct: 121 RAKVALRRALNRINVSKK 138 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1410 

A DNA sequence (GBSxl495) was identified in S.agalactiae <SEQ ID 4327> which encodes the amino 
acid sequence <SEQ ID 4328>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to the beta subunit of the S.mutans ATPase: 

>GP:AAD13383 GB:U31170 ATPase, beta subunit [Streptococcus mutans] 
Identities = 435/466 (93%) , Positives = 455/466 (97%) 



Query: 


X 


MfafaCaitlAyVVtPVWVVbAbbDKljPEINNMjIVyKNG 


60 






MS+GKIAQWGPWDV FA+ DKLPEINNAL+VYK+GDKSQ+VVuEVALELGDGLVRTTA 




Sbjct: 


1 


MSTGKIAQWGPvVDVAFATDDKLPEINNALVVyKDGDKSQRIVLEVALELGDGLVRTIA 


60 


Query: 


O J. 


Mii&lJJbljIKGLhVbDiCiRAISVPVGKDTLGRVFW 


120 






lyihiiblJJtjjj-ilKCaijEiV LUiRVrNVljGD 1DL++PFAEDAERQPIHKKAP 




Sbjct : 


61 


MESTDGLTRGLEVFDTGRAISVPVGKETLGRVFNVLGDTIDLDKPFAEDAERQPIHKKAP 


120 


Query: 


121 


SFDELSTSSEILETGIKVIDLIAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 


180 






SFD+LSTS+EILETGIKVIDLLAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 




Sbjct: 


121 


SFDDLSTSTEILETGIKVIDIiIiAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 


180 


Query: 


181 


VFTGVGERTREGNDIiYWEMECESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 


240 






VFTGVGERTREGNDLYWEMKEStWIEKTAMVFGQMNEPPGARMRVALTGLTIAEyFRDVE 




Sbjct: 


181 


VFTGVGERTREGNDLYWEMKESGVIEKTAMVK^MNEPPGMMIVALTGLTIAEYFRDVE 


240 


Query: 


241 


GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTIATEMGQLQERITSTKKGSVTSI 


300 






GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 




Sbjct: 


241 


GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 


300 


Query: 


301 


QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRALTPEIVGDEH 


360 






QAIYVPADDYTDPAPATAFAHLDSTTNLER+LTQMG1YPAVDPLASSSRAL+PEIVG EH 




Sbjct: 


301 


QAIYVPADDYTDPAPATAFAHLDSTTNIiERRLTQMGIYPAVDPLASSSRALSPEIVGQEH 


360 


Query: 


361 


YEVATEVQRVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAETFTGQ 


420 






Y+VATEVQ VLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAE FTGQ 




Sb j ct : 


361 


YDVATEVQHVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAEQFTGQ 


420 


Query. 


421 


PGSYVPVEETWGFKEILDGKHDQIPEDAFRMVGGIEDVIAKAEKM 466 








PGSYVPV ETVRGFKEIL+GK+D++PEDAFR VG IEDV+ KA+KM 




Sbjct: 


421 


PGSYVPVAETVRGFKEILEGKYDELPEDAFRSVGAIEDVVEKAKKM 466 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4329> which encodes the amino acid 
sequence <SEQ ID 4330>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0275 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 440/468 (94%) , Positives = 456/468 (97%) 
Query: 1 MSSGKIAQWGPVVDWFASGDKIiPEINNAnivYKNGDKSQKVvLEVALELGDGIjVRTIA 60 
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MSSGKIAQWGPVVBV+FASGDKLPEINNALIVYK+ DK QK+VLEVALELGDG+VRTIA 
Sbjct: 1 MSSGKIAQWGPVVDVMFASGDKI,PEIJra^LIVYKDSDKKQKIVLEVALELGDG^tVRTIA 60 

Query: 61 MESTDGLTRGLEVLDTGRAISVPVGKDTLGRVENVLGDAIDliEEPFAEDAERQPIHKKAP 120 

MESTDGLTRGLEVLDTGRAI SVPVGK+TLGR VENVLG+ IDLEEPFAED +RQPIHKKAP 
Sbjct: 61 MESTDGLTRGLEVLDTGRAI SVPVGKETLGRVFNVLGETIDLEEPFAEDVDRQPIHKKAP 120 

Query: 121 SFDELSTSSEILETGIKVIDLIAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 180 

SFDELSTSSEILETGIWIDLIAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 
Sbjct: 121 SFDELSTS SE ILETG I KVIDLIAPYLKGGKVGLFGGAGVGKTVIiI QELIHNIAQEHGGIS 180 

Query: 181 VFTGVGERTREGMDLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 240 

VFTGVGERTREG1TOLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 
Sbjct: 181 VFTGVGERTREGNDLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 240 

Query: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 300 

GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITST+KGSVTSI 
Sbjct: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTQKGSVTSI 300 

Query: 301 QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRALTPEIVGDEH 360 

QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRAL+PEIVG+EH 
Sbjct: 301 QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRALSPEIVGEEH -360 

Query: 361 YEVATEVQRVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAETFTGQ 420 

Y VATEVQRVLQRYRELQDIIAII^MDELSDEEKTLVGRARRIQFFLSQNFNVAE FTG 
Sbjct: 361 YAVATEVQRVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAEQFTGL 420 

Query: 421 PGSYVPVEETVRGFKEILDGKHDQIPEDAFRIWGGIEDVIAKAEKMNY 468 

PGSYVPV +TVRGFKEIL+GK+D++PEDAFR VG IEDVI KAEKM + 
Sbjct: 421 PGSYVPVADTWGFKEILEGKYDELPEDAFRSWPIEDVIKKAEKMGF 468 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1411 

A DNA sequence (GBSxl496) was identified in S.agalactiae <SEQ ID 433 1> which encodes the amino 
acid sequence <SEQ ID 4332>. Analysis of this protein sequence reveals the following: 
Possible site: 31 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1889 (Af f irmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23754 GB:AB009314 proton-translocating ATPase, gamma subunit 
[Streptococcus bovis] 
Identities = 252/293 (86%), Positives = 278/293 (94%), Gaps = 2/293 (0%) 

mGSLSEIKDKILSTEKTSKITSAMQ^WSSAKLVKSEQAARDFQWASKIRQITTNLLKS 60 
MAGSLSEIK KI+ST+KTS IT AMQMVS+AKL KSEQAA+DFQVYASKIRQITT+LLKS 
MAGSLSEIKGKIISTQKTSHITGAMQ^WSAABCLTKSEQAAKDFQvYASKIRQITTDLIiKS 60 

DLVSGSDNPMLSSRPVKKTGYI VITSDKGLVGGYNSKILKAMMDTITDYHTENDDYAI IS 120 
+LV+GS NPML++RPVKKTGYIVITSDKGI,VGGYNSKILKAMMD I +YH ++ +YAII+ 
ELWGSKNPMLAARPVKOTGYIVITSDKGLVGGYNSKILKAM^1DLIEEYH-QDGNYAIIA 119 

IGSVGSDFFKARGMNVSFELRGLEDQPSFTX3VGKIIAQAvEMYKNELFDELWCYNHHVN 180 
IG +G+DFFKARGMNV FELRGLEDQPSF+QVG IIA++VEMYKNELFDELYVCYNHHVN 
IGGIGADFFKARGMNWFELRGLEDQPSFEQVGNIIAKSVE^KNELFDELYVCYNHHVN 179 

Query: 181 SLTSQVRMQQMLPIKELDAEEASEDRVITGFELEPNREVILEQLLPQYTESLIYGAIIDA 240 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j ct : 


120 


Query: 


181 
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SLTSQVR+QQMLPI ELDA+EA+E+ V +GFELEPNRE+ILEQLLPQYTESLIYGAI+DA 
Sbjct: 180 SLTSQVRVQQMLPIAELDADEAAEEGV-SGFELEPNREMILEQLLPQYTESLIYGAIVDA 238 

Query: 241 KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 293 

KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 
Sbjct: 239 KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 291 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4333> which encodes the amino acid 
sequence <SEQ ID 4334>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1969 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 251/293 (85%), Positives = 275/293 (93%), Gaps = 2/293 (0%) 

Query: 1 MAGSLSEIKDKILSTEKTSKITSAMQMVSSAKLVKSEQAARDFQVYASKIRQITTNLLKS 60 

MAGSLSEIK KI+STEKTSKITSAM+MVSSAKLVKSEQAARDFQ+YASKIRQITT+LLKS 
Sbjct: 1 MAGSLSEIKAKIISTEKTSKITSAMRMVSSAKLVKSEQAARDFQIYASKIRQITTDLLKS 60 

Query: 61 DLVSGSDNPMLSSRPVKKTGYIVITSDKGLVGGYNSKILKAMMDTITDYHTENDDYAIIS 120 

+L GSDNPML SRPVKKTGYIVITSDKGLVGGYNSKILK++MD IT+YH + DY IIS 
Sbjct: 61 ELTIGSDNPMLVSRPVKKTGYIVITSDKGLVGGYNSKILKSVMDMITEYHADG-DYEIIS 119 

Query: 121 IGSVGSDFFKARGMNVSFELRGLEDQPSFDQVGKIIAQAVEMYKNELFDELYVCYNHHVN 180 

IGSVGSDFFKARGMNV+FELRGL DQPSF+QV +II+Q+V+M+ NE+FDELYVCYNHHW 
Sbjct: 120 IGSVGSDFFKARGMNVAFELRGLADQPSFEQVRQIISQSVDMFVNEIFDELYVCYNHHVN 179 

Query: 181 SLTSQVRMQQMLPIKELDAEEASEDRVITGFELEPNREVILEQLLPQYTESLIYGAIIDA 240 

SLTSQVR+QQMLPI +L A+EA+E+ V TGFELEPNR IL+QLLPQ+TESLIYGAIIDA 
Sbjct: 180 SLTSQVRVQQMLPISDLVADEAAEEGV-TGFELEPNRHDILDQLLPQFTESLIYGAIIDA 238 

Query: 241 KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 293 

KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAA1TQEITEIVAGANALE 
Sbjct: 239 KTAEHAAGMTAMQTATDNAKNVINDLT1QYNRARQAAITQEITEIVAGANALE 291 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1412 

A DNA sequence (GBSxl497) was identified in S.agalactiae <SEQ ID 4335> which encodes the amino 
acid sequence <SEQ ID 4336>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1963 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1413 

A DNA sequence (GBSxl498) was identified in S.agalactiae <SEQ ID 4337> which encodes the amino 
5 acid sequence <SEQ ID 433 8>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm --- Certainty=0 .3146 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to the alpha subunit of the proton-translocating ATPase from S.bovis: 

15 >GP:BAA23753 GB:AB009314 proton- translocating ATPase, alpha subunit 

[Streptococcus bovis] Length = 501 

Identities = 482/501 (96%) , Positives = 497/501 (98%) 





Query: 


1 


MAINAQEISALIKKQIEDFQPNFDVTETGI VTYIGDGIARARGLDNAMSGELLEFSNGAY 


60 


20 






MAINAQEISALIKKQIE+FQPNFDVTETG+VTYIGDGIARARGLDNAMSGELLEFSNGA+ 






Sbjct: 


1 


MAINAQEISALIKKQIENFQPNFDVTETGWTyiGDGIARARGLDNAMSGELLEFSNGAF 


60 




Query: 


61 


GMRQNLESNDVGIIILGDFSEIREGDWKRTGKIMEVPVGEAMIGRWNPLGQPVDGLGE 


120 








GMAQNLESNDVGI 1 1 LGDFS IREGD VKRTGKIMEVPVGEA+ 1 GR WNPLGQPVDGLG+ 




25 


Sb j ct : 


61 


GMAQNLESNDVGIIILGDFSTIREGDEVKRTGKIMEVPVGEALIGRWNPLGQPVDGLGD 


120 




Query: 


121 


1ETTATRPVETPAPGVMQRKSVFEPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 


180 








I+TTATRPVETPAPGVMQRKSV EPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 






Sb j ct : 


121 


IKTTATRPvETPAPGVMQRKSVSEPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 


180 


30 


Query: 


181 


DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGALDYT1WTASASQPSPLLFIAPY 


240 








DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGALDYTIWTASASQPSPLL+IAPY 






Sb j ct : 


181 


DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGALDYTIWTASASQPSPLLYIAPY 


240 


35 


Query: 


241 


AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 


300 








AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 






Sb j ct : 


241 


AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYREIjSLLLRRPPGREAYPGDVFYLHSRLLER 


300 




Query. 


301 


SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 


360 


40 






sakvsdalgggsitalpfietqagdisayiatnvisitdgqiflqenlfnsgirpaidag 






Sb j Ct : 


301 


SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 


360 




Query: 


361 


SSVSRVGGAAQIKAMKRVAGTLRLDLASYRELFAFTQFGSDLDARTQAKIiNRGRRTVEvL 


420 



TT-ITTT 
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bacterial cytoplasm Certainty=0. 3654 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 477/501 (95%) , Positives = 490/501 (97%) 

Query: 1 MAINAQEISALIKKQIEDFQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEFSNGAY 60 
+AINAQEISALIKKQIE+FQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEF NGAY 
10 Sbjct: 1 LAINAQE I SAL I KKQ1 ENFQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEFENGAY 60 

Query: 61 GMAQNLESNDVGIIILGDFSEIREGDWKRTGKIMEVPVGEAMIGRWNPLGQPVDGLGE 120 

GMAQNLESNDVGIIILGDFS IREGDWKRTGKIMEVPVGEA+ 1 GRWNPLGQP VDGLG+ 
Sbjct: 61 GMAQNLESNDVGIIILGDFSAIREGDWKRTGKIMEVPVGEALIGRWNPLGQPVDGLGD 120 

15 

Query: 121 IETTATRPVETPAPGVMQRKSVFEPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 180 

IETT RPVETPAPGVMQRKSV EPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 
Sbjct: 121 IETTGFRPVETPAPGVMQRKSVSEPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 180 

20 Query: 181 DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGALDYTIWTASASQPSPLLFIAPY 240 

DAILNQKGQDMICIYVAIGQKESTVRTQVETLR+YGALDYTIWTASASQPSPLLFIAPY 
Sbjct: 181 DAILNQKGQDMICIYVAIGQKESTVRTQVETLRRYGALDYTIWTASASQPSPLLFIAPY 240 

Query: 241 AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 300 

25 AGVAMAEEFMY gkhvliotddlskqavayrelslllrrppgreaypgdvfylhsrller 

Sbjct: 241 AGVAMAEEFMYQGKHVLIVYDDLSKQAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 300 

Query: 301 SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 360 
SAKVSD LGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 
30 Sbjct: 301 SAKVSDDLGGGSITALPFIETQftGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 360 

Query: 361 SSVSRVGGAAQI KAMKRVAGTLRLDLASYRELEAFTQFGSDLDA&TQ 420 

SSVSRVGG+AQIKAMKWAGTLRLDLASYRELEAFTQFGSDLDAATQAKIiNRGRRTVE+L 
Sbjct: 361 SSVSRVGGSAQIKRMKKVRflTLRLDLRSYRELEAFTQFGSDLDAATQAKLNRGRRTVEIL 420 

35 

Query: 421 KQPLHKPLPVEKQWILYALTHGFLDDVPVNDIIAFEEALYDYFDAHYDNLFETIRTTKD 480 

KQPLHKPLPVEKQWILYALTHGFLDDVPV+DILAFEEALYDYFD HY++LFETIRTTKD 
Sbjct: 421 KQPLHKPLPVEKQWILYALTHGFLDDVPVDDILAFEEALYDYFDVHYNDLFETIRTTKD 480 

40 Query: 481 LPEEAELDAAIQAFKDQSQFK 501 

LPEEA LDAAI+AFK+ S FK 
Sbjct: 481 LPEEAALDAAI KAFKEHSNFK 501 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1414 

A DNA sequence (GBSxl499) was identified in S.agalactiae <SEQ ID 4341> which encodes the amino 
acid sequence <SEQ ID 4342>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
50 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1896 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23752 GB:AB009314 proton- translocating ATPase, delta subunit 
[Streptococcus bovis] 
60 Identities = 98/178 (55%) , Positives = 127/178 (71%) 
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Query: 1 MNKKTQALIEQYSKSLVEVAIEHKIVEKIQQEVAALIDIFETSELEGVLSSLAVSHDEKQ 60 

M+KKTQAL+EQY+KSLVE+AIE + ++Q E AL+ +FE + L LSSL VS DEK 
Sbjct: 1 MDKKTQALVEQYAKSLVEIAIEKDSIAELQSETEALLSVFEETNLADFLSSLWSRDEKV 60 

5 Query: 61 HFVKTLQTSCSTYLVNFLEVIVQNEREALLYPILKSVDQELIKVNGQYPIQITTAVALSP 120 

V+ LQ S S Y+ NFLEVI +QNEREA L IL+ V ++ + Q+ I +TTAVAL+ 
Sbjct: 61 KIjTOLLQESSSVYMNNFLEVILQNEREAFLKAILEGVQKDFVIATNQHDIVVTTAVALTD 120 

Query: 121 EQKERLFDIAKTKIALPNGQLVEHIDPSIVGGFVVNANNKVIDASVRNQLHQFKMKLK 178 
10 EQKER+ + K + G+LVE+ID SI+GGFV+N NNKVID S+R QL +FKM LK 

Sbjct: 121 EQKERIIALVAEKFGVKAGKLVEMIDESILGGFVINVNNKVIDTSIRRQLQEFKMNLK 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4343> which encodes the amino acid 
sequence <SEQ ID 4344>. Analysis of this protein sequence reveals the following: 

15 Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1668 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/178 (48%) , Positives = 125/178 (69%) 

25 

Query: 1 MNKKTQALIEQYSKSLVEVAIEHKIVEKIQQEVAALIDIFETSELEGVLSSIAVSHDEKQ 60 

M KK QAL I EQY+ KSLVE VA EH ++ +Q +V A+++ F T+ L+ LSS AV H EK 
Sbjct: 1 MTKKEQALIEQYAKSLVEVASEHHSLDALQftDVIAILETFVTTNLDQSLSSQAVPHAEKI 60 

30 Query: 61 HWKTLQTSCSTYLVNFLEVIVQNEREAIjLYPILKSVDQELIKVNGQYPIQITTAVALSP 120 

+ L+ + S Y+ NFL +I+QNEREA LY +L++V E+ V+ QY + +T+++ L+ 
Sbjct: 61 KLLTLLKGNNSVYMNNFIjNLILCJNEREAYLYQMLQAvIiNEIAIVSNQTO 120 

Query: 121 EQKERLFDIAKTKLALPNGQLVEHIDPSIVGGEVVNATOKVIDASVRNQLHQFKMKLK 178 
35 EQK R+ + K A+ G+L+E +DPS++GGF+++ NNKVID S+R QL FKM LK 

Sbjct: 121 EQKSRVRAWAKKFAVTAGRLIEKVDPSLIGGFIISVNNKVIDTSIRRQLQAFKMNLK 178 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1415 

A DNA sequence (GBSxl500) was identified in S.agalactiae <SEQ ID 4345> which encodes the amino 
acid sequence <SEQ ID 4346>. This protein is predicted to be ATP synthase b chain (atpF). Analysis of 
this protein sequence reveals the following: 

Possible site: 33 
45 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAD13379 GB:U31170 ATPase, b subunit [Streptococcus mutans] 
Identities = 103/165 (62%) , Positives = 130/165 (78%) 

Query: 1 MSILINSTTIGDIIIVSGSVLLLFILIKTFAWKQITGIFEAREQKIANDIDTAEQARQQA 60 

MS LIN T++G+++IV+GS +LL +L+K FAW Q+ IF+ RE+KIA DID AE +RQ A 
Sbjct: 1 MSTLINGTSLGNLLIVTGSFILLLLLVKKFAWSQLAAIFKTREEKIAKDIDDAENSRQNA 60 



55 
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Query: 61 EAFATKREEELSNAKTEflNQIIDNAKETGIAKGDQIISEAKTEADRLKEKftHQDIAQNKA 120 

+ KR+ EL+ AK EA QIIDNAKETG A+ +II+EA EA RLK+KA+QDIA +KA 
Sbjct: 61 QVLENKRQVELNQAKDEAAQIIDNAKETGKAQESK1ITEAHEEAGRLKDKANQDIATSKA 120 

5 

Query: 121 EALADVKGEVADLTVLLAEKIMVSNLDKEAQSNLIDSYI KKLGDA 165 

EAL+ VK +VADL+VLLAEKIM NLDK AQ +LIDSY+ KLGDA 
Sbjct: 121 EALSSVKADVADLSVLLAEKIMAKNLDKTAQGDLIDSYLDKLGDA 165 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 4347> which encodes the amino acid 
sequence <SEQ ID 4348>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
>>> Seems to have a cleavable N-term signal seq. 

15 Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the databases: 

>GP:AAD13379 GB:U31170 ATPase, b subunit [Streptococcus mutans] 
Identities = 88/159 (55%) , Positives = 122/159 (76%) 

Query: 6 GELVGNFILVTGSVIVLLLLIKKFAWGAIESII.QTRSQQISRDIDQAEQSRLSAQQLEAK 65 
25 G +GN ++VTGS I+LLLL+KKFAW + +1 +TR ++I++DID AE SR +AQ LE K 

Sbjct: 7 GTSLGNLLIVTGSFILLLLLVKKFAWSQLAAIFKTREEKIAKDIDDAENSRQNAQVLENK 66 

Query: 66 SQANLDASRLQASKIISDAKEIGQLQGDKLVAEATDEAKRLKEKALTDIEQSKSDAISAV 125 
Q L+ ++ +A++II +AKE G+ Q K++ EA +EA RLK+KA DI SK++A+S+V 
30 Sbjct: 67 RQVELNQAKDEAAQIIDNAKETGKAQESKIITEAHEEAGRLKDKBNQDIATSKAEALSSV 126 

Query: 126 KTEMSDLTVLIAEKIMGANLDKTAQSQLIDSYLDDLGEA 164 

K +++DL+VLLAEKIM NLDKTAQ LIDSYLD LG+A 
Sbjct: 127 KAD VADLSVLIAEKIMAKNLDKTAQGDL I DS YLDKLGDA 165 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/156 (51%) , Positives = 115/156 (72%) 

Query: 10 IGDIIIVSGSVLLLFILIKTFAWKQITGIFEAREQKIANDIDTAEQARQQAEAFATKREE 69 
40 +G+ I+V+GSV++L +LIK FAW I I + R Q+I+ DID AEQ+R A+ K + 

Sbjct: 9 VGNFILVTGSVIVLLLLIKKFAWGAIESILQTRSQQISRDIDQAEQSRLSAQQLEAKSQA 68 

Query: 70 ELSNAKTEANQIIDNAKETGLAKGDQIISEAKTEADRLKEKAHQDIAQNKAEALADVKGE 129 
L ++ +A++II +AKE G +GD++++EA EA RLKEKA DI Q+K++A++ VK E 
45 Sbjct: 69 NLDASRLQASKIISDAKEIGQLQGDKLVAEATDEAKRLKEKALTDIEQSKSDAISAVKTE 128 

Query: 130 VADLTVLLAEKIMVSNLDKEAQSNLIDSYIKKLGDA 165 

++DLTVLLAEKIM +NLDK AQS LIDSY+ LG+A 
Sbjct: 129 MSDLTVLLAEKIMGANLDKTAQSQLIDSYLDDLGEA 164 

50 

SEQ ID 4346 (GBS 169) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 34 (lane 6; MW 18kDa). 

The GBS169-His fusion product was purified (Figure 200, lane 11) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 250). These tests confirm that the protein is 
55 immunoaccessible on GBS bacteria. 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1416 

A DNA sequence (GBSxl501) was identified in S.agalactiae <SEQ ID 4349> which encodes the amino 
acid sequence <SEQ ID 4350>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 5 6 92 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23750 GB:AB009314 proton-translocating ATPase, a subunit 
[Streptococcus bovis] 
Identities = 149/238 (62%) , Positives = 180/238 (75%) 

Query: 1 MESTSNPTVSFLGIDFDLTILAMSLLTITI I FILVFWASRKMTIKPKGKQNVXjEYVYEIjV 60 

ME++ NPT GI+FDLTILAMSLLT+ I F ++FWA+RKMT+KPKGKQN +EYVYE V 
Sbjct: 1 METSVNPTAHVFGIEFDLTILAMSLLTVIISFGIIFWATRKMTLKPKGKQNFIEYVYEFV 60 

Query: 61 NNTISQNLGHYTKNYSLLMFILFSFVFIANNLGIiMTSLKTHEHNFWTSPTANFGVDITLS 120 

NTI NLG YT YSLLMF F F+ IANNLGL+ L++ ++NFWTSPT+ VD T S 
Sbjct: 61 QNTIKPNLGEYTPKYSLLMFTFFFFILIANNLGLLVKIjESEDYNFWTSPTSTIMVDCTWS 120 

Query: 121 LLVAFICHIEGIRKKGIGGYLKGFLSPTPAMLPMNLLEEVTNVASLALRLFGNIFSGEVV 180 

L+VA t H+EG+RKKG+ YLKG+LSP P MLPMN+LE+ TNV SLALRLFGNI++GEW 
Sbjct: 121 LIVAIWHVEGWKKGVTaYLKGYLSPFPMMLPmiLEQFTNVLSLALRLFGNIYAGEVV 180 

Query: 181 TGLLLQLAVLSPFTGPLAFALNIVWTAFSMFIGFIQAYVFIILSSSYIGHKVHGDEEE 238 

T L++ S PA ALN+ W AFS FIG IQAYVF ILSS YI K+ DE+E 

Sbjct: 181 TALIVGFGTKSLIFAPFALALNLAWVAFSAFIGCIQAYVFTILSSKYISEKLPEDEDE 238 

A related DNA sequence was identified in S. pyogenes <SEQ ID 435 1> which encodes the amino acid 
sequence <SEQ ID 4352>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.73 Transmembrane 79 - 95 ( 72 - 97) 
INTEGRAL Likelihood = -4.35 Transmembrane 115 - 131 ( 112 - 132) 
INTEGRAL Likelihood = -2.13 Transmembrane 200 - 216 ( 197 - 216) 



Final Results 

bacterial membrane Certainty=0. 2890 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/239 (51%) , Positives = 169/239 (69%) , Gaps = 3/239 (1%) 

Query: 1 MESTSNPWSFLGIDFDLTILAMSLLTITIIFILVT'WASRKMTIKPKGKQNVLEYVYELV 60 

ME P + I F+LT+LA+ ++TI I+F VFWASR+M +KP+GKQ LEY+ V 
Sbjct: 1 MEEAKIPMLKLGPITFNLTLLAVCIVTIAIVFAFVFWASRQMKLKPEGKQTALEYLISFV 60 

Query: 61 NNTISQNLGH-YTKNYSLLMFILFSFVFIANNLGLMTSLKT-HEHNFWTSPTANFGVDIT 118 

+ ++L H K+YSLL+F +F FV +ANNLGL T L+T + +N WTSPTAN D+ 
Sbjct: 61 DGIGEEHLDHNLQKSYSLLLFTIFLFVAVANNLGLFTKLETvNGYNLWTSPTANLAFDLA 120 
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Query: 119 LSLLVAFICHIEGIRKKGIGGYLKGFLSPTPAMLPMNLLEE\TWASLALRLFGNIFSGE 178 

LSL + + HIEG+R++G+ +LK +P P M PMNLLEE TN SLA+RLFGNI F+GE 
Sbjct: 121 LSLFITLMVHIEGTORRGLVAHLKRLATPWP-MTPMHLLEEFTNFLSLAIRLFGNIFAGE 179 

Query: 179 VVTGLLLQLAVLSPFTGPIAFTUjNIVWTAFSMFIGFIQAYVFIILSSSYIGHKVHGDEE 237 

VVTGL++QLA + P+AF +N+ WTAFS+FI IQA+VF L+++Y+G KV+ EE 
Sbjct: 180 VWGLIVQIAOTRVYWWPIAFLV1#IAOTAFSVFISCIQAFVFTKLTATYLGKKVNESEE 238 

A related GBS gene <SEQ ID 8803> and protein <SEQ ID 8804> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: -3.50 
GvH: Signal Score (-7.5): -3.36 

Possible site: 29 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 5 value: -11.73 threshold: 0.0 
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modified ALOM score: 2.85 



*** Reasoning Step: 3 



The protein has homology with the following sequences in the databases: 

ORF01818(301 - 1014 Of 1314) 

GP|266232l|dbj |BAA23750.l| |AB009314(1 - 238 of 239) proton- translocating ATPase, a subunit 
{Streptococcus bovis} 
%Match = 35.0 

%Identity = 62.2 %Similarity = 78.6 

Matches =148 Mismatches =51 Conservative Sub.s =39 

204 234 264 294 324 354 384 414 

XANCQTLMLPGVGFIERYFLRSICVYILSKIDDNLEKKEG*GLESTSNPTVSFLGIDFDLTILAMSLLTITIIFILVFWA 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5692 (Affirmative) < suco 
t Certalnty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



METS VNPTAHVFGIEFDLTILAMSLLTVI ISFGI I FWA 
10 20 30 



444 474 504 534 564 594 624 654 

SRKMTIKPKGKQNVLEYWELVNNTISQNLGHYTKNYSLLM 




50 60 70 80 90 100 



684 714 744 774 804 834 864 894 

LSLLVAFICHIEGIRKKGIGGYLKGFLSPTPAMLPMNLLEEVTNVASIALRLFGNIFSGEVVTGLLLQLAVLSPFTGPLA 




130 140 150 160 170 180 



924 954 984 1014 1044 1074 1104 1134 

FALNIVWTAFSMFIGFIOAYVFIILSSSYIGHKVHGDEEE*EKRGEICQYLLIVQRLVISLSYLALCFSYLS*LRLLHGN 



LALNLAWVAFSAFIGCIQAYVFTILSSKYISEKLPEDEDET 
210 220 230 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1417 

A DNA sequence (GBSxl502) was identified in S.agalactiae <SEQ ID 4353> which encodes the amino 
acid sequence <SEQ ID 4354>. This protein is predicted to be ATP synthase c subunit (atpE). Analysis of 
this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.62 Transmembrane 48 - 64 ( 42 - 65) 



Final Results 

bacterial membrane Certainty=0 .2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23749 GB:AB009314 proton-translocating ATPase, c subunit [Streptococcus bovis] 
Identities = 56/65 (86%) , Positives = 59/65 (90%) 

20 Query: 1 MNLAILALGFAVMGVSIGEGILVANIAKSAARQPEMFSKLQTLMFTGVAFIEGTFFVLFA 60 

+NL ILALG AV+GVS+GEGILVANIAKSAARQPEMFSKLQTLMF GVAFIEGTFFVL A 
Sbjct: 2 I^KIIALGLAVLGVSLGEGILVANIAKSAARQPEMFSKLQTLMFLGVAFIEGTFFVLLA 61 

Query: 61 FTFLV 65 
25 TF V 

Sbjct: 62 STFFV 66 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4355> which encodes the amino acid 

sequence <SEQ ID 4356>. Analysis of this protein sequence reveals the following: 

30 Possible site: 17 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.26 Transmembrane 47 - 63 ( 41 - 64) 

Final Results 

35 bacterial membrane Certaxnty=0 .3102 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40 >GP:AAD00920 GB:AF001955 UncE [Streptococcus sanguinis] 

Identities = 50/66 (75%) , Positives = 58/66 (87%) , Gaps = 1/66 (1%) 

Query: 1 ^PIF-ALALACFGVSLAEGFLMANLFKAASRQPEIIGQLRSLMILGVAFIEGTFFVTLV 59 
MN F L ACFGVS+AEG +M+NLFKAASRQPEIIGQLRSL+ILG+AF+EGTFFVTL 
45 Sbjct: 1 MNLTFLGLCFACFGVSIAEGLIMSNLFKAASRQPEIIGQLRSLLILGIAFVEGTFFVTLA 60 

Query: 60 MAFILK 65 

MAF++K 
Sbjct: 61 MAFVIK 66 

50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 33/62 (53%) , Positives = 45/62 (72%) 

Query: 5 ILALGFAVMGVSIGEGILVANIAKSAARQPEMFSKLQTLMFTGVAFIEGTFFVLFAFTFLVR 66 
55 I AL A GVS+ EG L+AN+ K+A+RQPE+ +L++LM GVAFIEGTFFV F+++ 

Sbjct: 4 IFAIiALACFGVSLAEGFLMANLFKAASRQPEIIGQLRSLMILGVAFIEGTFFVTLVMAFILK 65 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1418 

A DNA sequence (GBSxl503) was identified in S.agalactiae <SEQ ID 4357> which encodes the amino 
5 acid sequence <SEQ ID 4358>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 2562 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1419 

A DNA sequence (GBSxl504) was identified in S.agalactiae <SEQ ID 4359> which encodes the amino 
20 acid sequence <SEQ ID 4360>. This protein is predicted to be bacterial glycogen synthase (glgA). Analysis 
of this protein sequence reveals the following: 
Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 1574 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA19591 GB:D87026 bacterial glycogen synthase [Bacillus 
s tearothermophi lus ] 
Identities = 220/475 (46%) , Positives = 312/475 (65%) , Gaps = 1/475 (0%) 

35 Query: 1 MKIMFVAAEGAPFAKTGGLGDVIGALPKSLSKKGHDvAVVMPYYDMVDQKFGDQIENLMY 60 

MK++F +E APFAK+GGL DV GALPK L + G D V+4P Y+ + ++ +++ + 
Sbjct: 1 MKVLFAVSECAPFAKSGGLADVAGALPKELRRLGIDARVMLPKYETIAPEWKKKMKKVAE 60 

Query: 61 FYTDVGTOHQWGVKRLSQDNOTFYFIDNQYYFYRGHVYGDWDDGERFAYFQLAALELME 120 
40 VGWR QY GV+ L D V +YFIDN+YYF R +YG +DDGERFAYF A LE++ 

Sbjct: 61 LIvPVGWRRQYCGVEELRHDGVIYYFIDNEYYFKRPQLYGHYDDGERFAYFCRAVLEVLP 120 

Query: 121 KIDFIPDVLHVHDYHTAMIPFLLKEKYHWIQAYNNIRAVFTIHNIEFQGQFGPEMLGDLF 180 
+1 F PDV+H HD+HT M+PFLL+E+Y Y ++R VFTIHN++FQG F +L DL 

45 Sbjct: 121 EIQFQPDVIHCHDWHTGMVPFLLREQYRHELFYVDMRTVFTIHNLQFQGLFPRGILEDLL 180 

Query: 181 GVGAERYEDGTLRWNNCIjNWMKAAILYSDRVTWSPSYANEIKTPEFGKGLDQIMRMEAG 240 

+ + L + C+++MK A++ SD +TTVSP+Y EI+T +G+ LD ++R 
Sbjct: 181 NLDGRYFTvDHLEFYGCTSFMKGALVASDLITTVSPTYKEEIQTAYYGERLDGLLRARRD 240 



50 



Query: 241 KLSGIWGIDSDLLNPETDAFLPYHFSKSNLEGKIKNKLALQENLGLPQDKNVPLIGIVS 300 

L GI+NGID + NPE D FL +S E K NK ALQ GLP+ +VPLI +V+ 
Sbjct: 241 DLLGILNGIDDEFYNPFADPFLTATYSVHTRERKQLNKRALQRQFGLPEWDDVPLIAMVT 300 
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Query: 301 RLTDQKGFDIIASELDMMLQQDIQMVIIjGTGYHHFEETFSYFASRYPEBCLSANITFDLRL 360 

R+T QKG D++ M+ +D+Q+V+LGTG FE+ FS A+ YP K+ IF L 

Sbjct: 301 RMTAQKGLDLVTCVFHEMMSEDMQLVVI^TGDmFF^FFSQMRAAYPGKVGVYIGFHEPL 360 

5 Query: 361 AQQIYAASDIFMMPSAFEPCGLSQMMAMRYGSLPLVHEVGGLKDTWAFNQFDGSGTGFS 420 

A QIYA +D+F++PS FEPCGLSQM+A+RYG++P+V E GGL DTV ++N+ G GFS 
Sbjct: 361 AHQIYAGADLFLIPSLFEPCGLSQMIALRYGTIPIVRETGGLNDTVQSYWEITKEGNGFS 420 

Query: 421 FOTFSGYWLMQTLKIALEVYNDYPEAWKKLQWQAMSKDFSVroTACVAYEQIiYQQL 475 
10 F +F+ + ++ T++ AL Y P W++L +AM D+SW + Y+Q Y+QL 

Sbjct: 421 FTNFNAHDMLYTIRRALSFYRQ-PSVWEQLTERAMRGDYSWRRSANQYKQAYEQL 474 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1420 

A DNA sequence (GBSxl505) was identified in S.agalactiae <SEQ ID 4361> which encodes the amino 
acid sequence <SEQ ID 4362>. This protein is predicted to be a subunit of ADP-glucose 
pyrophosphorylase. Analysis of this protein sequence reveals the following: 

20 Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3492 (Affirmative) < suco 
25 bacterial membrane . — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA19590 GB:D87026 subunit of ADP-glucose pyrophosphorylase 
30 [Bacillus stearothermophilus] 

Identities = 59/178 (33%) , Positives = 111/178 (62%) , Gaps = 1/178 (0%) 

Query: 37 SAEIYVIDTPWLIEKMEEEAQNNEPRKLRFLLRDLIVESNALAFEYTGYLSNISSIKSYY 96 
S E+Y+++T L++ + + +N+ + ++RD + +EY+GY + I S++ Y+ 

35 Sbjct: 157 SLEMYLIjETSIiLLDIiIADY-KNHGYYSIVDVIRDYHRSLSICEYEYSGYAAVIDSVEQYF 215 



40 



Query: 97 DANMDMLTPNKFYSLFFSNQKVYTKVKNEEATYFDKQSNVSNSQLASGSIIKGYLDHSIV 156 

++M++L + + LF + +YTKVK+E T + ++ NV S +A+G +I+G +++S++ 
Sbjct: 216 RSSMELIiDRDVWEQLFLPSHP I YTKVKDE P PTKYGREGNVKRSMIANGCVI EGTVENSVL 275 

Query: 157 SRNCLLEKGTRvVNSIIFPKVKIGEGATIFJWIIDKCOTCvASGvTLKGSLDKPLVIPK 214 

R+ + KG V NSII K +IG+G ++ IIDK KV GV LKG+ ++P ++ K 
Sbjct: 276 FRSVKIGKGAWRNSI IMQKCQIGDGCVLDGVI IDKDAKVEPGWLKGTKEQPFIVRK 333 



45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1421 

A DNA sequence (GBSxl506) was identified in S.agalactiae <SEQ ID 4363> which encodes the amino 
50 acid sequence <SEQ ID 4364>. This protein is predicted to be subunit of ADP-glucose pyrophosphorylase 
(glgC-1). Analysis of this protein sequence reveals the following: 
Possible site-. 32 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9765> which encodes amino acid sequence <SEQ ID 9766> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:BAA19589 GB:D87026 subunit of ADP-glucose pyrophosphorylase 

[Bacillus stearothermophilus] 
Identities = 195/352 (55%) , Positives = 259/352 (73%) 

Query: 7 MKNEMLALI^GGQGTRLGKLTQSIAKPAVQFGGRYRIIDFALSNCANSGINNVGVITQY 66 
15 MK + +A++LAGGQG+RL LT +IAKPAV FGG+YRIIDF LSNC NSGI+ VGV+TQY 

Sbjct: 1 MKKKCIAMLLAGGQGSRLRSLTTNIAKPAVPFGGKYRIIDFTLSNCTNSGIDTVGVLTQY 60 

Query: 67 QPLELNTHIGNGSSWGLDGIDSGVTVLQPYSATEGNRWFQGTSHAIYQNIDYIDRINPEY 126 
QPL L+++IG GS+W LD + GVTVL PYS + G +W++GT++A+YQNI+YI++ NP+Y 
20 Sbjct: 61 QPLIiLHSYIGIGSAWDLDRRNGGVTVLPPYSVSSGVKWYEGTANAVYQNINYIEQYNPDY 120 

Query: 127 VLILSGDHIYKMNYDDMLQTHKDNLASLTVAVLDVPLKEASRFGIMNTDSNDRIVEFEEK 186 

VL+LSGDHIYKM+Y ML H A +T++V++VP +EASRFGIMNT+ IVEF EK 
Sbjct: 121 VliVIiSGDHIYKMDYQHMLDYHIAKQADOTISVIEvPVffiEASRFGIMNTNEEMEI VEFAEK 180 

25 

Query: 187 PEHPKSTKASMGI YI FDWKRLRTVLIDGEKNGIDMSDFGKNVI PAYLESGERVYTYNFDG 246 

P PKS ASMGIYIF+W L+ L N DFGK+VIP L +R + Y F+G 

Sbjct: 181 PAEPKSNLASMGIYIFNWPLLKQYLQIDNANPHSSHDFGKDVIPMLLREKKRPFAYPFEG 240 

30 Query: 247 YWKDVGTIESLWEANMEYIGEDNKLHSRDRSWKIYSKl^IAPPNFMTEDANvlCDSLvVDG 306 

YWKDVGT++SLWEANM+ + E+N+L DRSW+IYS N PP +++ +A V DSLV +G 
Sbjct: 241 YWKDVGTVKSLWEANMDLLDENNELDLFDRSWRIYSVNPNQPPQYISPEAEVSDSLVNEG 300 

Query: 307 CFVAGNVEHSILSTNVQVKPNAIIKDSFVMSGATIGEGAKINRAIIGEDAVI 358 
35 C V G VE S+L V++ A++K+S +M GA + EGA + RAI+ D++I 

Sbjct: 301 CVVEGTVERSvLFQGTOIGKGAWKESVIMPGAAVSEGAYVERAIVTPDSII 352 

There is also homology to SEQ ID 2660. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1422 

A DNA sequence (GBSxl507) was identified in S.agalactiae <SEQ ID 4365> which encodes the amino 
acid sequence <SEQ ID 4366>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
45 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2844 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA78440 GB:Z14057 1 , 4-alpha-glucan branching enzyme [Bacillus 
caldolyticus] 

55 Identities = 272/616 (44%) , Positives = 371/616 (60%) , Gaps = 14/616 (2%) 



Query: 6 ELYTFGIGENFHLQNYLGVHSENGSFC FRVWAPNAENVQVIGDFTDWRNRPLQMNK 61 

E+Y F G + G H G F VWAP+A V+++G F DW + K 
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Sbjct: 10 EVYLFHEGRLYQSYELFGAHVIRGGG!AVGTRFCWRPHaREVRLVGSF^roWNG'r^ISPLTK 69 

-NQAGVWEANSLDAREGDLYKyLVTRKGGQVVEKIDPMAVYMERRPGTASVIKVLRNKM 120 
N GVW + EG JjYKY + G+V+ K DP A Y E RP TAS++ L+ +W 

VITOEGVWTIWPEI^EGHLYKYEIITPDGRVLLKADPYAFYSELRPHTASIWDLKGYE^ 12 9 

EDGLVMGRRKRLGFQKRPIMIYEVHAGSWKTODFGHPMTFSQLKDYLIPYIjVEMKYTHVE 180 

D W ++4-R +P+ IYE+H GSWKK G T+ ++ D LIPY++E +TH+E 

HDSPWQRKKRRKRIYDQPMVIYELHFGSWKKKPIX3RFYTYREMADELIPYVLERGFTH1E 189 

FMPI.MAHPLDMSWGYQLMGYFAFEHTYGTPEEFQDEVEACHKNNIGVLVDWVPGHFIQHD 240 

+PL+ HPLD SWGYQ GY++ YGTP +F FV+ CH+ +GV++DWVPGHF ++ 
LLPLVEHPLDRSWGYQGTGYYSVTSRYGTPHDFMYFVDRCHQAGLGVIIDWVPGHFCKDA 249 

DAIAYFDGTATYEYQIfflDRAHtTYRWGALNFDIjGKNQ 300 
L FDG TYEY N NY WG WFDLGK +V+SFLIS+ALFW+E+YB+DG RVD 



Sbjct: 


10 


Query: 


62 


Sbjct: 


70 


Query: 


121 


Sbjct: 


130 


Query: 


181 


Sbjct: 


190 


Query: 


241 


Sbjct: 


250 


Query: 


301 


Sb j ct : 


310 


Query: 


361 


Sbjct: 


352 


Query: 


421 


Sbjct: 


422 


Query: 


481 


Sb j ct : 


482 


Query: 


540 


Sbjct: 


542 


Query: 


600 


Sbjct: 


602 



20 AV+NMLY ++ +E N FLR+LN+ + PMV MIAE+ST +T 

JSMLYWPNNDRLYE - -NPYAVEFLRQtMEAVFAYDPNVWMIAEDSTDWPRVT 361 



GGLGF++KWNMGWMND+L++ E P R+Y N V+FS +Y ++ENF+L FSHDE 



WHGKKS+++KM G +FA hR LY Y M HPGKKLLFMGSEF QF EWK+ ++L+W 



+ 4+++KM Y KQL YK +K + +D G E ID N +++ SFIR+ K G 



D+L+ V N T ++ + VP Y EVLN++ EFGG 



40 + + T+P G St R 

Sbjct: 602 YHVRMTIPPFGISILR 617 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 1423 

A DNA sequence (GBSxl508) was identified in S.agalactiae <SEQ ID 4367> which encodes the amino 
acid sequence <SEQ ID 4368>. This protein is predicted to be pullulanase (pulA). Analysis of this protein 
sequence reveals the following: 

Possible site: 45 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3194 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear! < suco 

The protein has homology with the following sequences in the GENPEPT database. 



60 



>GP:AAC44685 GB:U67061 pullulanase [Bacteroides thetaiotaomicron] 
Identities = 223/597 (37%), Positives = 331/597 (55%), Gaps = 55/597 (g%) 

Query: 139 EYSETKTAFRLWAPTAERVELILYHSTDETASVSKVLSMKRGTA 198 
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EY+ T F LW+PTA+ V L+LY + E + + M+G GW 
Sbjct: 46 EYTPKATKFTLWSPTADEVRLMLYEA-GEGGHAYETVKMQSGE- -EGTWTA 93 

Query: 199 ELEGNYNYQAYTYRVYYRRRTFKITRDPYSIATTANGKRSIVIAPEALTPEGFKISHGKE 258 

+ + + YT+ V + T + A NGKR+ +1 ++ P+G++ + 
Sbjct: 94 WSKDLIGKFYTFNVKIDDKWQGDTPGINARAVGVNGKRAAIIDWQSTKPDGWE SD 149 

Query: 259 AKmLENPNQAVIYEMHWDFSISETSGVKTDYHGKTKGLHQKGTVNQHGDKTTFDYVQD 318 

+ L++P +IYEMH RDFS+ TSGVK GK+ L + GT+N T D++ + 

Sbjct: 150 TRPPLKSPADMIIYEMHHRDFSVDSTSGVKNK--GKYLALTEHGTMNSDKLLTGIDHLIE 207 

Query: 319 LGVOTIQLQPIFDHHQTFDDD-GHYAYiaWGYDPENYNVPEASFSSNPHEPATRILELKSA 377 

LGV ++ L P FD+ + +YNWGYDP+NYNVP+ S++++P++PATR+ E K 

Sbjct: 208 LGVTHVHLLPSFDYASVDETRLNENSYNWGYDPQNYNVPDGSYATDPYQPATRVKEFKQM 267 

Query: 378 IQAYHDAGIGVIMDVVYMHTFSSTDSAFQLTVPDYYYRMNHNGTFQNGSGCGNETASEKE 437 

+QA H AGI VIMDWYNHTF++ +S F+ TVP Y+YR + T NGSGCGNETASE+ 
Sbjct: 268 VQALHKAGIRVIMDVVYNHTFNTDESNFERTVPGYFYRQKEDKTLANGSGCGNETASERL 327 

Query: 438 MCRKYILDSVLYWVKEYNIDGFRFDLMGLHDVETMNIIRNELNKIDPRILVYGEGWDMGA 497 

M RK++++SVLYW+KEY++DGFRFDLMG+HD+ETMN IR +N +DP I +YGEGW A 
Sbjct: 328 MMRKFMVESVLYWI KEYHVDGFRFDLMGI HD I ETMNE IRKAVNAVDPT I CI YGEGWAAEA 387 

Query: 498 GLTPQNK-AKKDNAYQMPGIGFFNDDVRDAV KGAEIYGEFKKGLVSGNSTEDIVAKG 553 

P + A K N Q+PG+ F+D++RD + G+GFG+G EVG 
Sbjct: 388 PQYPADSLAMKGNIAQIPGVAVFSDELRDGLCGPVGDKRKGAFLAGIPGG EMSVKFG 444 

Query: 554 ILGSDE LVSYI DPSQVLNYVEAHDNYNIiNDLLWELHPNDNEKQHIYR 600 

I G+ E V+Y P Q+++YV HD L D L P+ +Q I 

Sbjct: 445 IAGAIEHPQVQCDSVNYTQKPWAKQPVQMISYVSCHDGLCLVDRLKA.SMPDITPEQLIRL 504 

Query: 601 VEVASAMNLLMQGMAFMQLGQEFLRTKCTPTGDKGQLTQADKERftMNSYNAPDQWQVNW 660 

++A A+ QG+ F+ G+E +R DK+ NSY +PD VN ++W 

Sbjct: 505 DKLAQAWFTSQGIPFIYAGEEIMR DKQGVDNSYKSPDAVNAIDW 549 

Query: 661 DNVTFHKSTINFIRKIITLKTNSPYFSYSSFEEIRKHVFVESAQYHSGFISFTVEEH 717 

T + +++I L+ + P F ++RKH+ + S I+F +++H 

Sbjct: 550 RRKTTSADVFMYYKRLIDLRKSHPAFRMGDAGQVRKHLEFLPVE-GSNLIAFRLKDH 605 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1424 

A DNA sequence (GBSxl509) was identified in S.agalactiae <SEQ ID 4369> which encodes the amino 
acid sequence <SEQ ID 4370>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2368 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12492 GB:Z99107 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 151/293 (51%) , Positives = 193/293 (65%) , Gaps = 5/293 (1%) 

Query: 5 KKARLIYNPTSGQEIMKKOTAEVLDILEGFGYETSAFQTTPTKNSARDEATRAAQAGFDL 64 

K+AR+IYNPTSG+EI KK++A+VL E GYETS TT A A AA FDL 

Sbjct: 2 KRARIIYNPTSGREIFKKHLAQVLQKFEQAGYETSTHATT-CAGDATHAAKEAALREFDL 60 



Query: 65 IVAAGGDGTINEVVNGIAPLICRRPKMAIIPTGTTNDFARALKIPRGNPIEATKLIGKNQI 124 
I+AAGGDGTINEWNG+APL RP + +IP GTTNDFARAL IPR + ++A + 
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Sbjct: 61 I IAAGGDGTINETVTOGIAPLD1TOPTW3VIPVGTTNDFARALGI PREDILKAADTVINGVA 120 

Query: 12S VKMDIGQAQEDNYFINIAAAGSLTELTYSVPSQLKTTFGYLAYLAKGVELLPRVRKVPVK 184 

+DIGQ YFINIA G LTELTY VPS+LKT G LAY KG+E+LP +R V+ 

Sbjct: 121 RPlDIGQVN-GQYFINIAGGGRLTELTYDVPSKLKTMLGQIAyyLKGMEMliPSLRPTEVE 179 

Query: 185 ITHDKGEFIGDASMIFVAITOSVGGFEQIAPDAKLDDGKFTI.ILVKTANLIEIMHLIRLV 244 

I +D F G+ + V +TNSVGGFE++APD+ L+DG F L+++K ANL E + + + 
Sbjct: 180 IEYDGKLFQGEIMLFLVTLTNSVGGFEKLAPDSSLNIXSMFDLMILKBCAinjAEFIRVAT^ 239 

Query: 245 ^GGKH1NDKRVEY1KTSYLTIEPLSDER^MI1SILDGEYGGDAPITIANI.KNHI 297 

L G+HIND+ + Y K + + + E+M +NLDGEYGG P NL HI 

Sbjct: 240 LR-GEHIITOQHIIYTKANRVKVK--VSEKMQLNLDGEYGGMLPGEFVNLYRHI 289 

A related DNA sequence was identified in S.pyogenes <SEQ ID 437 1> which encodes the amino acid 
sequence <SEQ ID 4372>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

■>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2501 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 272/334 (81%) , Positives = 300/334 (89%) 

Query: 1 MKKQKKARLIYNPTSGQEIMKKNVAEvLDILEGFGYETSAFQTTPTKNSARDEATRARQA 60 

MKKQ +ARLIYNPTSGQE+M+K+V EVLDILEGFGYETSAFQTT KNSA +EA RAA+A 
Sbjct: 1 MKKQLRARrjIYNPTSGQEL^KSVPEVLDI&EGra 60 

Query. 61 GFDLIVAAGGDGTINEVVNGIAPLKRRPKMAIIPTGTTNDFARALKIPRGNPIEATKLIG 120 

GFDL+ +AAGGDGT INEWNGIAPLK+RPKMAI I PTGTTNDFARALK+ PRGNP +A KLIG 
Sbjct: 61 GFDLLIAAGGDGTIITOVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIG 120 

Query: 121 KNQIVKMDIGQAQEDNYFINIAAAGSLTELTYSVPSQLKTTFGYLAYLAKGVEDLPRVRK 180 

I<NQ ++MDIG+A++D YFINIAAAGSLTELTYSVPSQLKT FGYLAYLAKGVELLPRV 
Sbjct: 121 KNQTIQMDIGRAKKDTYFINIASAGSLTEI.TYSVPSQLKrMFGYLAYLAKGVELLPRVSN 180 

Query: 181 VPVKITHDKGEFIGDASMIFVAITNSVGGFEQIAPDAKLDDGKFTLILVKTANLIEIMHL 240 

VPVKITHDKG F G SMIF AITNSVGGFE IAPDAKLDDG FTLIL+KTANL EI+HL 
Sbjct: 181 VPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIAPDAKLDDGMFTLILIKTANLFEIVHL 240 

Query: 241 IRLVIAGGKHIM3KRVEYIKTSYLTIEPLSDE^MINLDGEYGGDAPITLANLKNHIRFF 300 

+RL+L GGKHI D+RVEYIKTS + IEP +RMMINLDGEYGGDAPITL NLKNHI FF 
Sbjct: 241 LRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGGDAPITLENLKNHITFF 300 

Query: 301 ANTDEISDDALVLDKDEIAIEAIAQKFANEVDDL 334 

A+TD ISDDALVLD+DEL IE I +KFA+EV+DL 
Sbjct: 301 ADTDLISDDALVLDQDELEIEEIVKKFAHEVEDIi .334 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1425 

A DNA sequence (GBSxl510) was identified in S.agalactiae <SEQ ID 4373> which encodes the amino 
acid sequence <SEQ ID 4374>. This protein is predicted to be DNA ligase (IigA-1). Analysis of this protein 
sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -0.27 Transmembrane 363 - 379 ( 363 - 379) 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9763> which encodes amino acid sequence <SEQ ID 9764> 
was also identified. 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12482 GB:Z99107 similar to DNA ligase [Bacillus subtilis] 
Identities = 346/657 (52%) , Positives = 462/657 (69%) , Gaps = 8/657 (1%) 

Query: 2 ENRMNELVSLLNQYAKEYYTQDNPTVSDSQYDQLYRELVELEKQHPENILPNSPTHRVGG 61 
15 + R EL +N+Y+ EYYT D P+V D++YD+L +EL+ +E++HP+ P+SPT RVGG 

Sbjct: 7 KQRAEELRRTINKYSYEYYTLDEPSVPDAEYDRLMQELIAIEEEHPDLRTPDSPTQRVGG 66 

Query: 62 LVLEGFEKYQHEYPLYSLQDAFSKEELIAFDKRVKAEF-PTAAYMAELKIDGLSVSLTYV 120 
VLE F+K H P+ SL +AF+ ++L FD+RV+ AY ELKIDGL+VSL Y 

20 i Sbjct: 67 AVIjEAFQKVTHGTPMLSLGNAFNADDLRDFDRRVRQSVGDDVAYNVELKIDGLAVSLRYE 126 

■ Query: 121 NGVLQVGATRGDGNIGENITENLKRVHDIPLHLDQSLDITVRGECYLPKESFEAINIEKR 180 
+G GATRGDG GE+ITENLK + +IPL +++ L I VRGE Y+PK SFEA+N E+ 
Sbjct: 127 DGYFTOGATRGDGTTGEDITENLKTIRNIPLKMNRELSIEVRGEAYMPKRSFEALNEERI 186 

25 

Query: 181 ANGEQEFANPRNAAAGTLRQLNTGIVAKRKLATFLYQEASPTQK--ETQDDVLKELESYG 238 

N E+ FANPRNAAAG+LRQL+ I AKR L F+Y A + ETQ L L+ G 
Sbjct: 187 KNEEEPFANPRNRARGSLRQLDPKIAAKRNLDIFVYSIAELDEMGVETQSQGLDFLDELG 246 

30 Query: 239 FSVNHHRLISSSMEKIWDFIQTIEKDRVSLPYDIDGIVIKVNSIAMQEELGFTVKAPRWA 298 

F N R S+E++ I ++ R LPY+IDGIVIKV+S+ QEELGFT K+PRWA 
Sbjct: 247 FKTNQERKKCGSIEEVITLIDELQAKRADLPYEIDGIVIKVDSLDQQEELGFTAKSPRWA 306 

Query: 299 IAYKFPAEEKEAEILSVT1WTVGRTGVVTPTANLTPVQLAGTTVSRATLHNVDYIAEKDIR 358 
35 IAYKFPAEE ++L ++ VGRTGV+TPTA L PV++AGTTVSRA+LHN D I EKDIR 

Sbjct: 307 IAYKFPAEEWTKLLDIELNVGRTGVITPTAILEPVKVAGTTVSRASLHNEDLIKEKDIR 366 

Query: 359 IGDTVWYKAGD 1 1 PAVLNWMS KRNQQE VML - 1 PKLCPSCGSELVHFEGE VALRC INPL 417 
I D VW KAGDIIP V+NV++ +R +E +P CP CGSELV EGEVALRCINP 
40 Sbjct: 367 ILDKWVKKaGDIIPEVVNVLVDQRTGEEREFSMPTECPECGSELVRIEGEVALRCINPE 426 

Query: 418 CPNQIKERLAHFASRDAMNITGFGPSLVEKLFDAHLIADVADIYRLSIENLLTLDGIKEK 477 

CP QI+E L HF SR+AMNI G G ++ +LF+ +L+ +VAD+Y+L+ E ++ L+ + EK 
Sbjct: 427 CPAQIREGLIHWSRNAMNIDGLGERVITQLFEENLVRNVADLYKLTKERVIQLERMGEK 486 

45 

Query: 478 SATKIYHAIQSSKENSAEKLLFGLGIRHVGSKASRLLLEEFGNLRQLSQASQESIASIDG 537 

S + +IQ SKENS E+LLFGLGIR +GSKA++ L F +L L +AS+E + ++D 
Sbjct: 487 STENLISSIQKSKENSLERLLFGLGIRFIGSKARKTLAMHFESLENLKKASKEELLAVDE 546 

50 Query: 538 LGGVIAKSLHTFFEKEEVDKLLEELTSYNVNFNYLG KRVSTDAQLSGLTWLTGKL 593 

+G +A ++ T+F KEE+ +LL EL VN Y G K +D+ +G T+VLTGKL 
Sbjct: 547 IGEKMADAVITYFHKEEMLELLI^LQELGVNTLYKGPKK^nOffiDSDSYFAGKTIVLTGKL 606 

Query: 594 EKMTRNEAKEKLQNLGAKVTGSVSKKTDLIVAGSDAGSKLTKAQDLGITIQDEDWLL 650 
55 E+++RNEAK +++ LG K+TGSVSK TDL++AG AGSKLTKAQ+L I + +E+ L+ 

Sbjct: 607 EELSRNEAKAQIEALGGKLTGSVSKNTDLVIAGEAAGSKLTKAQELNIEVWNEEQLM 663 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4375> which encodes the amino acid 
sequence <SEQ ID 4376>. Analysis of this protein sequence reveals the following: 

60 Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 363 - 379 ( 363 - 379) 
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Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 472/652 (72%) , Positives = 556/652 (84%) 



Query: 


1 


MENRMNELVSLIMQYAKEYYTQDNPTVSDSQYDQLYRELVELEKQHPENILPNSPTHRVG 


60 






M+ R+ EL LLN+Y +YYT+D P+VSDS YD+LYRELV LE+ +PE +L +SPT +VG 




Sb j ct : 


1 


MKKRIKELTDLLNRYRYDYYTKDAPSVSDSDYDKLYRELVTLEQSYPEYVLQDSPTQQVG 


60 


Query: 


61 


GLVLEGFEKYQHEYPLYSLQDAFSKEELIAFDKRVKAEFPTAAYMAELKIDGLSVSLTYV 


120 






G +L+GFEKY+H+YPL+SLQDAFS+EEL AFDKRVKAEFP A Y+AELKIDGLS+SL+Y 




Sb j ct : 


61 


GTILKGFEKYRHQYPLFSLQDAFSREELDAFDKRVKAEFPNATYLAELKIDGLSISLSYE 


120 


Query: 


121 


NGVLQVGATRGDGNIGENITENLKRVHDIPLHLDQSLDITVRGECYLPKESFEAINIEKR 


180 






NG LQVGATRGDGNIGENITEN+K++ DIP L + L ITVRGE Y+ ++SF+AIN ++ 




Sb j ct : 


121 


NGFLQVGATRGDGNIGENITENIKKIKDIPYQLSEPLTITVRGEAYMSRQSFKAINEARQ 


180 


Query: 


181 


ANGEQEFANPRNAAAGTLRQLNTGIVAKRKIATFLYQEASPTQKETQDDVLKELESYGFS 


240 






NGE EFANPRNAAAGTLRQL+T +VAKR+LATFLYQEASPT + Q++VL EL GFS 




Sbjct: 


181 


ENGETEFANPRNAAAGTLRQLDTSWAKRQIATFLYQEASPTARNQQNEVLAELADLGFS 


240 


Query: 


241 


VNHHRLISSSMEKIWDFIQTIEKDRVSLPYDIDGIVIKVNSIAMQEELGFTVKAPRWAIA 


300 






VN + ++SSM++IWDFI+TIE R L YDIDG+VIKVNS+AMQEELGFTVKAPRWAIA 




Sb j ct : 


241 


TOPYYQLTSSMDEIWDFIKTIEAKRDQLAYDIDGWIKVNSLAMQEELGFTVKAPRWAIA 


300 


Query: 


301 


YKFPAEEKEAEILSVDWWGRTGVVTPTANLTPVQLAGTTVSRATLHNVDYIAEKDIRIG 


360 






YKFPAEEKEAEILSVDWTVGRTGVVTPTA^TPVQLftGTWSRATLHNVDYIAEKDIRIG 




Sb j ct : 


301 


YKFPAEEKEAEILSVDWTVGRTGVVTPTAiaTPVQLftGTTVSRATLHNvrJYIM 


360 


Query: 


361 


DTVWYKAGD 1 1 PA VLNVVMSKRNQQEVMLIPKLCPSCGSEIi VHFEGEVALRC INPLCPN 


420 






DTV+vYKAGDIIPAVLNVVMSKRNQQEVMLIPKLCPSCGSELVHFE EVALRCINPLCP+ 




Sb j ct : 


361 


DTVI VYKAGDIIPAVLNVVMSKRNQQEVMLIPKLCPSCGSELVHFEDEVALRCINPLCPS 


420 


Query: 


421 


QIKERLAHFASRDAMNITGFGPSLVEKLFDAHLIADVADIYRLSIENLLTLDGIKEKSAT 


480 






1+ L HFASRDAMNITG GP++VEKLF A + DVADIY+L+ E+ + LDGIKEKSA 




Sb j ct : 


421 


LIQRSLEHFASRDAMNITGLGPAIVEKLFLAGFVHDVADIYQLTKEDFMQLDGIKEKSAD 


480 


Query: 


481 


KIYHAIQSSKENSAEKLLFGLGIRHVGSKASRLLLEEFGNLRQLSQASQESIASIDGLGG 


540 






K+ AI++SK NSAEKLLFGLGIRH+GSK SRL+LE +G++ L A +E IA IDGLG 




Sb j ct : 


481 


KLIAAIEASKSNSAEKLLFGLGIRHIGSKVSRLILEVYGDISALLTAKEEEIARIDGLGS 


540 


Query: 


541 


VIAKSLHTFFEKEEVDKLLEELTSYIWNFNYIfiKRVSTDAQLSGLTvvLTGKLEKMTRNE 


600 






IA+SL +FE++ L++EL + VN +Y G++V++DA L GLTWLTGKL ++ RNE 




Sbjct: 


541 


TIAQSLTQYFEQKTAAILVDELKTAGVNMHYSGQKyNSDAALFGLTVVLTGKI^QIJSIRNE 


600 


Query: 


601 


AKEKLQNLGAKVTGSVSKKTDLIVAGSDAGSKLTKAQDLGITIQDEDWLLNL 652 








AK+KL+ LGAKVTGSVSKKTDL++AGSDAGSKL KA+ LGI I+DEDWL L 




Sb j ct : 


601 


AKDKLEALGAKVTGSVSKKTDLVIAGSDAGSKLEKAKSLGIRIEDEDWLRQL 652 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. ! 

Example 1426 

A DNA sequence (GBSxl511) was identified in S.agalactiae <SEQ ID 4377> which encodes the amino 
acid sequence <SEQ ID 4378>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.63 Transmembrane 110 - 126 ( 108 - 128) 
INTEGRAL Likelihood = -2.13 Transmembrane 142 - 158 ( 141 - 159) 
INTEGRAL Likelihood = -1.12 Transmembrane 75 - 91 ( 75 - 93) 



WO 02/34771 



PCT/GB01/04789 



-1573- 



Final Results 

bacterial membrane Certainty=0. 3251 (Affirmative) < suco 

bacterial outside — Certainty=0. 00 00 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA68244 GB:X99978 citrulline cluster-linked gene [Lactobacillus 
plantarumj 

Identities = 56/158 (35%) , Positives = 91/158 (57%) , Gaps = 8/158 (5%) 

Query: 13 AIWAIYIVLTITPPFNAIAYGAYQFRVSEMIMFIiAFYHRKyiiFAVTLGCMISNLYSFG- 71 

A+V A+Y+VL + P ++A GA QFRVSE LN LA ++RKY++ + G ++ + + G 
Sbjct: 13 ALVAAMYVVLCLGPAAFSIASGAIQFRVSEGIjNHIAVFNRKYIWGIVAGVILFDAFGPGA 72 

Query: 72 -MIDVFVGGGSTLLFVYLGTILFKQYQKDYLFNGL1NKAFFFFSFFFAASMITVAVELKI 130 

+++V GGG +LL + + T L + K Ii+N A F S F A MIT+ + 
Sbjct: 73 • SLIjNVLFGGGQSLLALLVLTW^PKL-KTVWQRMLMIALFTVSMFMIALMITM M 126 

Query: 131 VAGLPLLLTWLTTAVGELASLLVGAVLVDKLSRHVDFT 168 

+G+ T+LTTA+ EL + +A++ LR + F+ 
Sbjct: 127 SSGVAFWPTYLTTALSELIIMSITAPIMYSLDRVLHFS 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4379> which encodes the amino acid 
sequence <SEQ ID 4380>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.41 Transmembrane 75 - 91 ( 70 - 94) 

INTEGRAL Likelihood = -3.82 Transmembrane 12 - 28 ( 8 - 28) 

INTEGRAL Likelihood = -2.28 Transmembrane 141 - 157 ( 140 - 158) 

INTEGRAL Likelihood = -0.64 Transmembrane 110 - 126 ( 110 - 126) 

INTEGRAL Likelihood = -0.59 Transmembrane 55 - 71 ( 54 - 73) 



Final Results 

bacterial membrane Certainty=0. 2763 (Affirmative) <; suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/167 (68%), Positives = 137/167 (81%), Gaps = 1/167 (0%) 

Query: 1 MNTFTTRDYAHMAIVTAIYI vXiTITPPFNAIAYGAYQFRVSEMLNFLAFYHRIOTLFAVTL 60 

M T DY H+ +V A+Y+VLTITPP NAI+YG YQFR+SEM+NFLAFYHRKY+ AVTL 
Sbjct: 1 MTKLTVHDYVHIGLVAALYVVLTITPPIjNAISYG^IYQFRISEA^FIAFYHRKyilAVTL 60 

Query: 61 GCMISNLYSFGMIDVFVGGGSTLLFVYLGTILFKQYQKDYLFNGLINKAFFFFSFFFAAS 120 

GCMI+N YSFG+ IDVFVGGGSTL+FV LG ILF +YQKDYLFNG+ NKAF +FSFFFA S 
Sbjct: 61 GCMIANFYSFGLIDVFVGGGSTLIFVTLGVILFSKYQKDYLFNGIFNKAFVYFSFFFATS 120 

Query: 121 MITVAVELKIVAGLPLLLTWLTTAVGEIjASLLVGAVLVDKLSRHVDF 167 

M VA+EL G P LLTW TTA+GEL SLL+G++-H-DKLS+ + F 
Sbjct: 121 MFNVAIELYFF -GAPFLLTWFTTALGELVSLLIGSLI IDKLSQRISF 166 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1427 



A DNA sequence (GBSxl513) was identified in S.agalactiae <SEQ ID 4381> which encodes the amino 
acid sequence <SEQ ID 4382>. Analysis of this protein sequence reveals the following: 
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Possible site: 53 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-11.20 Transmembrane 255 - 271 ( 245 - 281) 
INTEGRAL Likelihood =-10.72 Transmembrane 141 - 157 ( 132 - 165) 
5 INTEGRAL Likelihood = -8.17 Transmembrane 189 - 205 ( 185 - 208) 

INTEGRAL Likelihood = -7.01 Transmembrane 36 - 52 ( 33 - 60) 

Final Results 

bacterial membrane Certainty=0. 5479 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35915 GB:AF071085 0rfde2 [Enterococcus faecalis] 
15 Identities = 83/276 (30%) , Positives = 157/276 (56%) , Gaps = 3/276 (1%) 

Query: 17 RPIQVFMRHFQSAEMDLSAIAVAYYLLOTAFPLLVIAANIFPYFHINVSDLLSLMQKNLP 76 

R 1+ H +AE+ S++ VAYYLL++ FPLL+ N+ PY 1+ + +L + + +P 
Sbjct: 15 RFIETTQSHMVTAEIGNSSVWAYYLLLSLFPLLIAVGNVLPYLRIDPNSVLPYIAEAIP 74 

20 

Query: 77 KNIYEPASRLAVDAFSKPSTGILGFASLTAFWTMSKSLTSLQKAINKAYGVDQHRDFVIS 136 

K++Y+ ++ S G+L ++L AFW+ S+S+ +LQ A+NKA+GV+Q ++F++ 

Sbjct: 75 KDWKNLEPAIRSLLTQRSGGLLSVSALAAFWSASQSINALQNAMNKAFGVEQRKNFILV 134 

25 Query: 137 RLVGVGTGLIILFLLTFVLIFSTFSKPVLQIIVNMYDLGDTLTAWLLNLAQPVTFLTIFL 196 

R+V L+ + + V++ + +++++ ++ ++ L P+T + + + 

Sbjct: 135 RWSFLVILLF^AIVGVWILGLGQYIIELLQPIFHYSTSVIDTFCALKWPLTTVVLLV 194 

Query: 197 GIGILYFILPNARIRKVRYVIPGTLFSTFVIGFFSNLISQYVLNRVEKMVDIKTFGSWI 256 
30 + ++Y ++PN ++ +R ++PG +FST S + YV ++ + GS + 

Sbjct: 195 IMCLIYAWPNRKL-SLRSILPGAIFSTVGWMLLSQIFGLYVKYFSSRIASYQIIGSFI- 252 

Query: 257 FILMLWFI FLAHIMILGAILNASVQEIATGKIESRR 292 
ILMLW F A I+ILGAI+NA V E G E ++ 
35 Sbjct: 253 - ILMLWLNFAATI I ILGAIVNAWDEYLXGXKEKKQ 287 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4383> which encodes the amino acid 
sequence <SEQ ID 4384>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

40 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-12.58 Transmembrane 141 - 157 ( 132 - 168) 

INTEGRAL Likelihood =-12.15 Transmembrane 189 - 205 ( 177 - 210) 

INTEGRAL Likelihood =-11.68 Transmembrane 256 - 272 ( 245 - 280) 

INTEGRAL Likelihood = -7.54 Transmembrane 36 - 52 ( 33 - 60) 

45 



50 



55 



Final Results 

bacterial membrane Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA68244 GB:X99978 citrulline cluster- linked gene [Lactobacillus 
plantarum] 

Identities = 53/170 (31%) , Positives = 92/170 (53%) , Gaps = 11/170 (6%) 

Query: 1 MTKLTVHDYVHIGLVAALYVVLTITPPLNAISYGMYQFRISE^1MNFLAFYHRKYI IAVTL 60 

MT+ + ++ LVAA+YWL + P +++ G QFR+SE +N LA ++RKYI + 
Sbjct: 1 MTQSKIRPWIINALVAAMYVVLCLGPAAFSLASGAIQFRVSEGLNHLAVFNRKYIWGIVA 60 

60 Query: 61 GCMIANFYSFG--LIDVFVGGGSTLIFVTLGVILFSKYQKDYLFNGIFNKAFVYFSFFFA 118 
G ++ + + G L++V GGG +L+ + + LK+ +++ + + F 

Sbjct: 61 GVILFDAFGPGASLLNVLFGGGQSLLALLVLTWLAPKLKT VWQRMLLNIA- LFT 113 



Query: 119 TSMFNVA--IELYFFGAPFLLTWFTTALGELVSLLIGSLIIDKLSQRISF 166 
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SMF +A I + G F T+ TTAL EL+ +I+I+ L++F 
Sbjct: 114 VSMFMIALMITMMSSGVAFWPTYLTTALSELIIMSITAPIMYSLDRVLHF 163 
!GB:AF071085 Orfde2 [Enterococcus faecalis] 176 2e-43 

>GP:AAC35915 GB:AF071085 Orfde2 [Enterococcus faecalis] 
Identities = 90/271 (33%) , Positives = 155/271 (56%) , Gaps = 3/271 (1%) 

IQVFMRHLQSAEMDLSAIAVAYYLILTAFPLIVIAANIFPYIiNIDIADLLRLMKQNLPKD 78 
1+ H+ +AE+ S++ VAYYL+L+ FPL++ N+ PYL ID +L + + +PKD 
IETTQSHMVTAEIGNSSVWAYYLLLSLFPLLIAVGNVLPyLRIDPNSVLPYIAEAIPKD 76 

IFRPASAIVENIFSKPSGSVLGVATLTGLWTMSRSLTSLQKAINKAYGASQHRDFFIGHL 138 
+++ + ++ ++ SG +L V+ L W+ S+S+ +LQ A+NKA+G Q ++F + + 

VYKNLEPA1RSLLTQRSGGLLSVSALAAFWSASQSINALQNAMNKAFGVEQRKNFILVRV 13 6 



V L L+ + + ++ + I++L +H S ++ F L P+T +++ V + 



L+Y ++PN K+ +R ILPG +F++ LS + G YV Y R+ ++ GS 



Query: 


19 


Sb j ct : 


17 


Query: 


79 


Sbjct: 


77 


Query: 


139 


Sb j ct : 


137 


Query: 


199 


Sb j ct : 


197 


Query: 


259 


Sb j ct : 


254 



+MLW F A I+ILGAI NA E G E 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 188/302 (62%) , Positives = 244/302 (80%) 

Query: 1 MKLKKFFEDLLAKLEYRPIQVFMRHFQSAE^LSAIAvAYYLIjVTAFPLLVIARNIFPYF 60 

M KK+F+ +L+K +Y PIQVFMRH QSAEMDLSAIAVAYYL++TAFPL+VIAANIFPY 
Sbjct: 1 MAEKKWFDKVLSKWQYEPIQVFMRHLQSAEMDLSAIAVAYYLILTAFPLIVIAANIFPYL 60 

Query: 61 HIWSDLLSLMQKNrjPKNIYEPASRLAvrjAFSKPSTGILGFASLTAFWTMSKSLTSLQKA 120 

+I+++DLL LM++NLPK+I+ PAS + + FSKPS +LG A+LT WTMS+SLTSLQKA 
Sbjct: 61 NIDIADLLRLMKQNLPKDI FRPASAIVENI FSKPSGSVLGVATLTGLWTMSRSLTSLQKA 120 

Query: 121 INKAYGVTJQHRDFVISRLVGVGTGLIILFLLTFVLIFSTFSKPVLQIIVNMYDLGDTLTA 180 

INKAYG QHRDF I LVG+ T LIILFLL F LIFS FSK +Q++ Y L D +T 
Sbjct: 121 INKAYGASQHRDFFIGHLVGLLTSLIILFLLAFALIFSIFSKAAIQVLDKHYHLSDNITT 180 

Query: 181 WLLNLAQPVTFLTIFLGIGILYFILPNARIRKVRYVIPGTLFSTFVIGFFSNIilSQYVLN 240 

L L QP+T L IF+G+ +LYF+LPN +I+K+RY++PGTLF++FV+ F SNL+ YV+ 
Sbjct: 181 IFLLLIQPITVLIIFVGLMLLYFLLPI^IKKIRYILPGTLFTSFvMTFLS^VGNYVVY 240 

Query: 241 RvEKMVTJIKTFGSWIFILMLWFIFLAHIMIL^IIjNASVQEIATGKIESRRGDIMSLIQ 300 

VE+MVDIK FGSV+IFI+MLWFIFLA I+ILGAI NA+ QE++ GK+E R GD++++++ 
Sbjct: 241 NVERMVDIKMFGSvMIFIIMLWFIFLARILILGAIFNATYQEMSLGKLEGRSGDMIAILK 300 

Query: 301 KS 302 
K+ 

' Sbjct: 301 KT 302 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1428 

A DNA sequence (GBSxl514) was identified in S.agalactiae <SEQ ID 4385> which encodes the amino 
acid sequence <SEQ ID 4386>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4200 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

10 Example 1429 

A DNA sequence (GBSxl515) was identified in S.agalactiae <SEQ ID 4387> which encodes the amino 
acid sequence <SEQ ID 4388>. This protein is predicted to be methionine aminopeptidase (map). Analysis 
of this protein sequence reveals the following: 

Possible site: 14 
15 >>> Seems to have no N-terminal signal sequence 

Final Results . ., 

bacterial cytoplasm Certainty=0 . 2342 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 976 1> which encodes amino acid sequence <SEQ ID 9762> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:AAC35914 GB:AF071085 methionine aminopeptidase A [Enterococcus 

f aecalis] 

Identities = 101/207 (48%) , Positives = 128/207 (61%) , Gaps = 31/207 (14%) 

Query: 1 MITLKSAREIEAMDRAGDFLASIHIGLRDIIKPGVDMWEVEEYVRRRCKEENVLPLQIGV 60 
30 MITLKS REIE MD +G+ LA +H LR IKPG+ W++E +VR + + QIG 

Sbjct: 1 MITLKSPREIEMMDESGELLADVHRHLRTFIKPGITSWDIEVFVRDFIESHGGVAAQIGY 60 

Query: 61 DGAvMDYPYATCCGLlSIDEVAHAFPRHYTLKQGDLLKVDMVLSEPLDKSI VDVSSIiNFDNV 120 
+G Y YATCC +NDE+ H FPR LK GDL+KVDM + 
35 Sbjct: 61 EG YKYATCCSINDEICHGFPRKKVLKDGDLIKVDMCVD 98 

Query: 121 AQMKKYTETYSGGLADSCMAYAVGEVSQEVKDLMSVTREAMYIGIEKAVIGNRIGDIGAA 180 

G ++DSCW+Y VGE + E+ LM VT++A+Y+GIE+A +GNRIGDIG A 
Sbjct: 99 LKGAISDSCWSYWGESTPEIDRLMEVTKKALYLGIEQAQVGNRIGDIGHA 149 

40 

Query: 181 I QDYAESRGYGWRDLVGHGVGPTMHE 207 

IQ Y E GYGWRD VGHG+GPT+HE 
Sbjct: 150 IQTYVEGEGYGWRDFVGHGIGPTIHE 176 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 4389> which encodes the amino acid 
sequence <SEQ ID 4390>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 . 2082 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 



55 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 256/286 (89%) , Positives = 273/286 (94%) 



Queiry : 


1 


MITLKSAREIEAMDRAGDFLASIHIGLRDIIKPGVDMl'JEVEEYvRRRCKEENvLPLQIGV 


60 






MITLKSAREIEAMDRAGDFLA IHIGLRDIIKPGVDMWEVE YVRRRCKE+NVLPLQIGV 




Sbjct: 


1 


MITLKSAREIEAMDRAGDFIAGIHIGLRDIIKPGvDMWEVEAYVRRRCKEDNVLPLQIGV 


60 


Queiry : 


61 


DGAvMDYPYATCCGLNDEVAHAFPRHyTLKQGDLLKOTMVLSEPLDKSIVDVSSLNFDNV 


120 










Sbjct: 


61 


DGHMMDYPYATCCGLNDEVAHAFPPJIYILKEGDLLKVDMvLSEPLDKSI vDVAALDFDIIV 


120 


Ouerv* 


121 


AQMKKYTETYSGGIjyDSCWAYAVGEVSQEVKDLMSVTREAjroiGIEKAVIGNRiGDIGAA 


180 






+MKK+T +Y+GGLADSCWAYAVG S E+K LM VT+EAMY GIEKAVIGNRIGDIGAA 




Sb j ct : 


121 


PEMKKWIGSYTGGIADSCWAYAVGTPSDEIKQLMDOTKE2VMYRGIEKAVIGNRIGDIGAA 


180 


Query: 


181 


IQDYAESRGYGWRDLVGHGVGPTMHEEPMVPNYGTAGRGLRLREGMVLTIEPMINTGTW 


240 






+Q+YAES GYGWRDLVGHGVGPTMHEEPMVPNYGTAGRGLRL+EGMVLT+EPMINTGTW 




Sb j ct : 


181 


VQEYAESFGYGVVRDLVGHGVGPTMHEEPMVPOTGTAGRGLRLKEGMVLTVEPMINTGTW 


240 


Query: 


241 


EIDTDMKTGWAHKTLDGGLSCQYEHQFVITKDGPVILTSQGEERTY 286 








EIDTD+KTGWAHKTLDGGLSCQYEHQFVITKDGPVILTSQGEERTY 




Sb j ct : 


241 


EIDTDIKTGWAHKTLDGGLSCQYEHQFVITKDGPVILTSQGEERTY 286 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1430 

A DNA sequence (GBSxl516) was identified in S.agalactiae <SEQ ID 4391> which encodes the amino 
acid sequence <SEQ ID 4392>. Analysis of this protein sequence reveals the following: 
Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3473 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9759> which encodes amino acid sequence <SEQ ID 9760> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06894 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 158/431 (36%) , Positives = 270/431 (61%) , Gaps = 6/431 (1%) 

6 SIOiQEILEYLENIAVGKRVSWSISNHLKVSIX3TAYRAIKEAENRGIvETRPRSGTVRVA 65 

+KH++IL+Y+ NL VG+++SVR 1+ L+VS+GTAYRAIKEAEN+G+V T R GT+R+ 
3 TKHEQILQYITNLEVGEKISVRRIAKDLQVSEGTAYRAIKEAENQGLVSTIERVGTIRIE 62 

66 QKAKVNIEKLTYAEIARISDSQWAGIEGLSKEFSKFSIGAMTHRNIEKYLVQGGLLIVG 125 ^ 

+K K NIEKLTYAE+ I D QV+ G +GL K ++F I GAM + +Y+ G LLIVG 
63 KKQKENIEKLTYAEvWIVDGQVLGGRDGLHKTLNRFVIGAMKLDM^RYVEPGNLLIVG 122 

126 DRDEIQHIALQHQNAILVTGGFNVSPSVCRLADKLQIPvMVTHYDTFTVSTMINHTLSNA 185 

+R ++ +AL+ A+L+TGGF+ S +LAD+L +PV+ T YDTFTV+TMIN + + 
123 NRYQVHQIALFAGAAVLITGGFDTSDEAIKLADELDLPVISTSYDTFTVATMINRAIYDQ 182 

186 KIRTDLKTVEQWQSQMDYGFLAQDDWKEFNLLWQTKNTOFPIWQANvWGWSVQD 245 

1+ ++ V+ + D ++ ++ V +++ L ++T + R+P++++ + G+V+ +D 

183 LIKKEITLVDDILIPLQDTYYMTTENWGKraELNEKTGHSRYPVIDENMKIQGMVAAKD 242 

246 ILGKDKEVKLATVMSKNI IVAKPRMSLANISQKMIFEDLNMMPWSDDFELLGVITRRQA 305 

+L + + VM+KN I R S+A ++ M++E + ++PV+ +L+GV++R+ 
243 VlNASRHTPIEKvMTKNPIWSERTSVAAVAHvMVWEGIELLPVIDSHRKLIGWSRQDV 302 
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Query: 306 VENLSMSQ GTDLYTYSDQILSNLQIEDG-HFSFLVEPAMIDHTGSLTQGVLTEFL 359 

++LMQ G+ L+ +G + +PM + G+++ GV+T + 

Sbjct: 303 LKALQMIQRQPHVGETIEDLMTNGLNESSSDQGDSYEVEITPQMTNQLGTISHGVMTSLV 362 

5 

Query: 360 KEI CIRVLTRKHQRS I WKQMTLYFLQPVQIDEI IMVTPTI I SEKRREATLDLELKLENK 419 

E RVL + + +W+ +TLYFL+PVQID + + P ++ R+ +D+E+ E + 
Sbjct: 363 IESGSRVLRKYKKGDLWENITLYFLKPVQIDSRLTIRPRVLEIGRKHGKIDVEMYHEGE 422 

10 Query: 420 IIAKAMIAVKI 430 

1+ KA+ +1 
Sbjct: 423 IVGKALFMAQI 433 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4393> which encodes the amino acid 
15 sequence <SEQ ID 4394>. Analysis of this protein sequence reveals the following: 
Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 3011 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 267/431 (61%), Positives = 351/431 (80%) 

Query: 1 MIIWSKHQEILEYLENLAVGKRVSVRSISNHIiKVSDGTAYRAIKEAENRGIVETRPRSG 60 

+1 1+MSKHQ+IL+YLE LA+GK+VS VRSISNHLKVSDGTAYRAIKEAENRGIVET+PRSG 
Sbjct: 1 VI I IMSKHQDILDYLEKIAIGKKv'S VRSISNHLKVSDGTAYRAIKEAENRGI VETKPRSG 60 

30 

Query: 61 TTOVAQKAKVNIEKLTYAEIARISDSQWAGIEGLSKEFSKFSIGAMTHRNIEKYLVQGG 120 

TVR+ +K +V I++LTY+EIARISDS+V+AG GL EFS+FSIGAMT +NI +YLV+GG 
Sbjct: 61 TVRIEKKGRVRIDRLTYSEIARISDSEVLAGHAGLGHEFSRFSIGAMTQQNIRRYLVKGG 120 

35 Query: 121 LLIVGDRDEIQHIALQHQNAILOTGGFNVSPSVCRLADKLQIPVMVTHYDTFTVSTMINH 180 

LLIVGDR+ IQ LAL++ NAILVTGGF VS V +A+ +IPVMVTHYDTFTV+TMINH 
Sbjct: 121 LLIVGDRETIQLLALENHNAILVTGGFPVSKRVIEMANNQRIPVMVTHYDTFTVATMINH 180 

Query: 181 TLSNAKIRTDLKTVEQVYQSQMDYGFLAQDDTVKEFNLLVKQTKNVRFPIVNQANVVVGV 240 
40 LSN + 1 +TDLKTVEQV DYG+L +D +V+EFN L+K+T+ VRFP+++ V+GV 

Sbjct: 181 ALSNIRIKTDLKTVEQVMIPITDYGYLCEDSSVEEFNTLIKKTRQVRFPVLDYKRKVIGV 240 

Query: 241 VSVQDILGKDKEVKLATVMSKNIIVAKPRMSLANISQKMIFEDLNMMPWSDDFELLGVI 300 
VS++D++ + KL VMSKN I A+P SLANISQKMIFEDIaNM+PV ++ LLG+I 
45 Sbjct: 241 VSMRDVVDQLPTTKLTKVMSKNPITARPNTSIANISQKMIFEDLNMLPVTDEENNLLGMI 300 

Query: 301 TRRQAVENLSMSQGTDLYTYSDQILSNLQIEDGHFSFLVEPAMIDHTGSLTQGVLTEFLK 360 

TRRQA+ENL Q + YTYS+QILSNL+ ++ +VEP MID G+++ GV++EFLK 
Sbjct: 301 TRRQAMENLPNHQPNNPYTYSEQILSNLEETVDYYQVVVEPTMIDSAGNMSNGVISEFLK 360 

50 

Query: 361 EICIRVLTRKHQRSIWKQMTLYFLQPVQIDEIIMVTPTIISEKRREATLDLELKLENKI 420 

EI IR LT+KHQ++I+++QM +YFL +QI++ + + P II+E RR +T+D+E+ +++++ 
Sbjct: 361 EISIRALTKKHQKNIIIEQMMVYFLHAIQIEDELKIYPKIITENRRSSTIDIEIFVDDQV 420 

55 Query: 421 IAKAMIAVKIN 431 

IAKA+I KIN 
Sbjct: 421 IAKAI ITTKIN 431 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1431 

A DNA sequence (GBSxl517) was identified in S.agalactiae <SEQ ID 4395> which encodes the amino 
acid sequence <SEQ ID 4396>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2837 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



15 





Query: 


7 




Sbjct: 


1 


20 


Query: 


66 




Sbjct: 


61 




Query: 


123 


25 








Sbjct: 


121 




Query: 


183 


30 


Sbjct: 


181 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04556 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
Identities = 56/185 (30%) , Positives = 86/185 (46%) , Gaps = 4/185 (2%) 

MDIWTNLGRFAFIETEHVNLRPVAYTDREAFWRIASKRTNLQFI - FPVQTSKKESDFLLV 6 5 
M+I G +ETE + LR D A + AS +++ + S K+S+ L 

MEIEDIYGDLPTLETERLRLRKFYKDDAAAIYDyASNEQVTKYVLWETHQSIKDSEAFLA 6 0 

HSFMK- - -EPLGVWAIEDKVSHKMFGVIRFENIDLSKKTAEIGYFLKESSWGQGIMTECL 122 

+ K + + WAIE K + +M G + F KTAE+GY L E WGQGIMTE + 

FAIJSrKYDEKDVSPWAIELKRNERMIGTvDFVWWKPKDKTAELGYVLSEPYWGQGIMTEAV 12 0 



L F F ++++ ENI+S +V KA + + + RD+ 



R DY 
IREDY 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 667> which encodes the amino acid 
sequence <SEQ ID 668>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1096 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/177 (53%) , Positives = 117/177 (65%) 

45 Query: 7 MDIWTNLGRFAFIETEHVNLRPVAYTDREAFWRIASKRTNLQFIFPVQTSKKESDFLLVH 66 

MDIWT L FAF ET V LRP YD F+ + + NL ++FP Q +K SD+LLVH 
Sbjct: 1 MDIWTKIAVFAFFETPKVILRPFRYEDHWDFYSMVNDTKNLYYVFPEQKTKAASDYLLVH 60 

Query: 67 SFMKEPLGVWAIEDKVSHKMFGVIRFENIDLSKKTAEIGYFLKESSWGQGIMTECLKTLS 126 
50 SF+K PLG WAIEDK +H++ G IR E+ D + A+IGYFL + WGQGIMTE + L 

Sbjct: 61 SFIKFPLGQWAIEDKATHQVIGSIRIEHYDAKTRCADIGYFLNYAFWGQGIMTEWIKLV 120 

Query: 127 FFAFREFGMDKLII VTHKENIASQKVALKAHFKQSRSFKGSDRYTRRIRDYIEFQLT 183 
+ +F EFG+ L I+TH EN ASQKVA KA F+ FKGSDR T +1 Y +QLT 
55 Sbjct: 121 YLSFHEFGLKTLRI ITHLFJtfKASQKVAKKAGFQLKTCFKGSDRNTHKI CI YKMYQLT 177 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1432 

A DNA sequence (GBSxl518) was identified in S.agalactiae <SEQ ID 4397> which encodes the amino 
acid sequence <SEQ ID 4398>. This protein is predicted to be UDP-N-acetylglucosamine-l-carboxyvinyl 
transferase (murA). Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.63 Transmembrane 25 - 41 ( 24 - 42) 

Final Results 

bacterial membrane Certainty=0 .3251 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF86297 GB:AF072894 UDP-N-acetylglucosamine-l-carboxyvinyl 
transferase [Listeria monocytogenes] 
Identities = 240/412 (58%) , Positives = 303/412 (73%) , Gaps = 2/412 (0%) 



Query: 


3 


KIIINGGKQLTGEVAVSGAKNSWALIPATILADDVVVLDGVPAISDVDSLVDIMETMGA 


62 






K+II GGK+L G + V GAKNS VALIPA ILA+ WL+G+P ISDV +L +I+E +G 




Sb j ct : 


20 


KLIIRGGKKLAGTLQVDGAKNSAVALIPAAILAESEWLEGLPDISDVHTLYNILEELGG 


79 


Query: 


63 


KIKRYGETLEIDPCGVKDIPMPYGKINSLRASYYFYGSLLGRYGQATLGLPGGCDLGPRP 


122 






++ +T IDP + +P+P G + LRASYY G++LGR+ +A +GLPGGC LGPRP 




Sbjct: 


80 


TVRYDNKTAVIDPTDMISMPLPSGNVKKLRASYYLMGAMLGRFKKAVIGLPGGCYLGPRP 


139 


Query: 


123 


IDLHLKAFEAMGASVSYEGDSMRLATNGKPLCjG^IYMDWSVGATIOTIIAAAKANGRT 


182 






ID H+K FEA+GA V+ E ++ L + L+GA IY+D VSVGATIN ++AA +A G+T 




Sb j ct : 


140 


IDQHIKGFEALGAKVTNEQGAIYLRAD--ELKGARIYLDWSVGATINIMLAAVRAKGKT 


197 


Query: 


183 


VIENAAREPEIIDVATLIJWIGAHIRGAGTDVITIEGVKSLHGTRHQVIPDRIEAGTYIA 


242 






VIENAA+EPEIIDVATLL NMGA I+GAGTD I I GV+ LHG H +IPDRIEAGT++ 




Sb j ct : 


198 


VIENAAKEPEIIDVATLLTNMGAIIKGAGTDTIRITGVEHLHGCHHTIIPDRIEAGTFMV 


257 


Query: 


243 


MAAAIGRGIKVTNVLYEHLESFIAKLDEMGVRMTVEEDS I FVEEQERLKAVSI KTSPYPG 


302 






+AAA G+G+++ NV+ HLE IAKL EMGV M +EED+IFV E E++K V IKT YPG 




Sb j ct : 


258 


IAAASGKGTOIEIWIPTHLEGIIAKLTEMGVPMDIEEDAIFVGEVEKIKKVDIKTYAYPG 


317 


Query: 


303 


FATDLQQPLTPLLLTAEGNGSLLDTIYEKRVNHVPELARMGANISTLGGKIVYSGPNQLS 


362 






F TDLQQPLT LL AEG+ + DTIY R H+ E+ RMG G V +GP QL 




Sb j ct : 


318 


FPTDLQQPLTALLTRAEGSSVITDTIYPSRFKHIAEIERMGGKFKLEGRSAVINGPVQLQ 


377 


Query: 


363 


GAPVKATDLRAGAALVIAGLMAEGRTEITNIEFILRGYSNIIEKLTSLGADI 414 








G+ V ATDLRAGAALVIA L+A+G TEI +E I RGYS IIEKL+++GA+I 




Sbjct: 


378 


GSKVTATDLRAGAALVIAALLADGETEIHGVEHIERGYSKIIEKLSAIGANI 429 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4399> which encodes the amino acid 
sequence <SEQ ID 4400>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.70 Transmembrane 25 - 41 ( 23 - 45) 

Final Results 

bacterial membrane Certainty=0 .4482 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF86297 GB:AF072894 UDP-N-acetylglucosamine-l-carboxyvinyl 
transferase [Listeria monocytogenes] 
Identities = 244/412 (59%) , Positives = 302/412 (73%) , Gaps = 2/412 (0%) 
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Query: 3 KIIINGGKALSGEVAVSGAKNSWALIPAIILftDDIVILDGVPAISDVDSLIEIMELMGA 62 

K+II GGK L+G + V GAKNS VALIPA ILA+ V+L+G+P ISDV +L I+E +G 
Sbjct: 20 KLIIRGGKKLAGTLQVDGAKNSAVALIPAAILAESEVVLEGLPDISDVHTLYNILEELGG 79 

Query: 63 TVNYHGDTLEIDPRGVQDIPMPYGKINSLRASYYFYGSLLGRFGQAWGLPGGCDLGPRP 122 

TV Y T IDP + +P+P G + LRASYY G++LGRF +AV+GLPGGC LGPRP 
Sbjct: 80 TTOYDNKTAVIDPTDMISMPLPSGWKKLRASYYLMGAMLGRFKKAVIGLPGGCYLGPRP 139 

Query: 123 IDLHLKAFEAMGVEVSYEGENMNLSTNGQKIHGAHIY>©TVSVGATINTMVAATKAQGKT 182 

ID H+K FEA+G +V+ E + L + ++ GA IY+D VSVGATIN M+AA. +A+GKT 
Sbjct: 140 IDQHIKGFFALGAKVTNEQGAIYLRAD--ELKGARIYLDWSVGATINIMLA&VRAKGKT 197 

Query: 183 VIENAAREPEIIDVATLtNNMGAHIRGAGTDIITIQGVQKLHGTRHQVIPDRIEAGTYIA 242 

VIENAA+EPEI IDVATLL NMGA I+GAGTD I I GV+ LHG H +IPDRIEAGT++ 
Sbjct: 198 VIENAAKEPE 1 1 DVATLLTNMGAI I KGAGTDTIRITGVEHLHGCHHT 1 1 PDRIEAGTFMV 257 

Query: 243 IAAAIGKGVKITNVLYEHLESFIAKLEEMGVRMTVEEDAIFVEKQESLKAITIKTSPYPG 302 

LAAA GKGV+I NV+ HLE IAKL EMGV M +EEDAIFV + E +K + IKT YPG 
Sbjct: 258 LAAASGKGVRI ENVI PTHLEGI IAKLTEMGVPMD I EEDAI FVGEVEKI KKVD I KTYAYPG 317 

Query: 303 FATDLQQPLTPLLLKADGRGTIIDTIYEKRINHVPELMRMGADISVIGGQIVYQGPSRLT 362 

F TDLQQPLT LL +A+G I DTIY R H+ E+ RMG + G V GP +L 
Sbjct: 318 FPTDLQQPLTALLTRAEGSSVITDTIYPSRFKHIAEIERMGGKFKLEGRSAVINGPVQLQ 377 

Query: 363 GAQVKATDLRAGAALVTAGLIAEGKTEITNIEFILRGYASIIAKLTALGADI 414 

G++V ATDLRAGAALV A L+A+G+TEI +E I RGY+ II KL+A+GA+I 
Sbjct: 378 GSKVTATDLRAGAALVIAALLADGETEIHGVEHIERGYSKIIEKLSAIGANI 429 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 344/419 (82%) , Positives = 394/419 (93%) 

Query: 1 MRKIIINGGKQLTGEVAVSGAKNSWALIPATILRDDVVVLDGVPAISDVDSLVDIMETM 60 

MRKI I INGGK L+GEVAVSGAKNS WAL I PA IIADD+V+LDGVPAISDVDSL+ + IME M 
Sbjct: 1 MRKIIINGGKALSGEVAVSGAKNSWALIPAIILADDIVILDGVPAISDVDSLIEIMELM 60 

Query: 61 GAKIKRYGETLEIDPCGVKDIPMPYGKINSLRASYYFYGSLLGRYGQATLGLPGGCDLGP 120 

GA + +G+TLEIDP GV+DIPMPYGKINSLRASYYFYGSLLGR+GQA +GLPGGCDLGP 
Sbjct: 61 GATVNYHGDTLEIDPRGVQDIPMPYGKINSLRASYYFYGSLLGRFGQAWGLPGGCDLGP 120 

Query: 121 RPIDLHLKAFEAMGASVSYEGDSMRIATNGKPLQGANIYMDTVSVGATINTIIAAAKANG 180 

RPIDLHLKAFEAMG VSYEG++M L+TNG+ + GA+IYMDTVSVGATINT++AA KA G 
Sbjct: 121 RPIDIiHLKAFEAMGVEVSYEGENimjSTNGQKIHGAHIYMDTVSVGATINTMVAATKAQG 180 

Query: 181 RTVIENAAREPEIIDVATLLNNMGAHIRGAGTDVITIEGVKSLHGTRHQVIPDRIEAGTY 240 

+TVIENAAREPEIIDVATLLNNMGAHIRGAGTD+ITI+GV+ LHGTRHQVIPDRIEAGTY 
Sbjct: 181 KTVIENAAREPEIIDVATLLNNMGAHIRGAGTDIITIQGVQKLHGTRHQVIPDRIEAGTY 240 

Query: 241 IAMAAAIGRGIKVTNVLYEHLESFIAKLDEMGVRMTVEEDSIFVEEQERLKAVSIKTSPY 300 

IA+AAAIG+G+K+TNVLYEHLESFIAKL+EMGVRMTVEED+ 1 FVE+QE LKA++IKTSPY 
Sbjct: 241 IALAAAIGKGVKITNvLYEHLESFIAKLEEMGVRMTVEEDAIFVEKQESLKAITIKTSPY 300 

Query: 301 PGFATDLQQPLTPLLLTAEGNGSLLDTIYEKRVNHVPELARMGANISTLGGKIVYSGPNQ 360 

PGFATDLQQPLTPLLL A+G G4 + +DTI YEKR+NHVPEL RMGA+IS +GG+IVY GP++ 
Sbjct: 301 PGFATDLQQPLTPLLLKADGRGTIIDTIYEKRINWPELMRMGADISVIGGQIVYQGPSR 360 

Query: 361 LSGAPVKATDLRAGAALVIAGLMAEGRTEITNIEFILRGYSNIIEKLTSLGADIQLVEE 419 

L+GA VKATDLRAGAALV AGL+AEG+TEITNIEFILRGY++II KLT+LGADIQL+E+ 
Sbjct: 361 LTGAQVKATDLRAGAALVTAGLIAEGKTEITNIEFILRGYASIIAKLTALGADIQLIED 419 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1433 

A DNA sequence (GBSxl519) was identified in S.agalactiae <SEQ ID 440 1> which encodes the amino 
acid sequence <SEQ ID 4402>. This protein is predicted to be thiamine phosphate pyrophosphorylase 
(thiE). Analysis of this protein sequence reveals the following: 

5 Possible site: 55 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0422 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25544 GB:AF109218 ThiE [Staphylococcus carnosus] 
15 Identities = 98/200 (49%) , Positives = 140/200 (70%) , Gaps = 1/200 (0%) 

Query: 5 LKLYFVCGTVDCSR-KNILTVVEFALQAGITLFQFREKGFTALCGKEKIAMAKQLQILCK 63 

L +YF+CGT D +1 V++FAL+ GITL+QFREKG A G++K+A+AK+LQ LCK 
Sbjct: 7 LNVYFI CGTQDI PEGRTI QE VLKEALEGGI TLYQFREKGNGAKTGQDKVALAKELQALCK 66 

20 

Query: 64 QYQVPFIIDDDIDLVELIDADGLHIGQNDLPVDEARRRLPDKIIGLSVSTMDEYQKSQLS 123 

Y VPFI++DD+ L E IDADG+H+GQ+D VD+ R KIIGLS+ ++E S L+ 
Sbjct: 67 SYNVPFIVNDDVALAEEIDADGIHVGQDDEAVDDFNNRFEGKIIGLSIGNLEELNASDLT 126 

25 Query: 124 WDYIGIGPFNPTQSKADAKPAVGNRTTKAVREINQDIPIVAIGGITSDFVHDIIESGAD 183 

VDYIG+GP T SK DA VG + + +R+ D+PIVAIGGI+ D V ++ ++ AD 
Sbjct: 127 YVDYIGVGPIFATPSKDDASEPyGPKMIETLRKEVGDLPIVAIGGISLDNVQEVAKTSAD 186 

Query: 184 GIAVISAISKANHIVDATRQ 203 
30 G++VISAI+++ H+ + + 

Sbjct: 187 GVSVISAIARSPHVTETVHK 206 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1434 

A DNA sequence (GBSxl520) was identified in S.agalactiae <SEQ ID 4403> which encodes the amino 
acid sequence <SEQ ID 4404>. This protein is predicted to be hydroxyethylthiazole kinase (b2104). 
Analysis of this protein sequence reveals the following: 

40 Possible site: 54 

>>:> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.94 Transmembrane 198 - 214 ( 194 - 217) 

Final Results 

45 bacterial membrane Certainty=0. 2 97 5 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8805> which encodes amino acid sequence <SEQ ID 8806> 
50 was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -2.93 
GvH: Signal Score (-7.5): 1.61 
Possible site: 39 
55 >>> Seems to have no N-terminal signal sequence 
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ALOM program count: 1 value: -4.94 threshold: 0.0 

INTEGRAL Likelihood = -4.94 Transmembrane 183 - 199 ( 179 - 202) 
PERIPHERAL Likelihood = 2.49 151 
modified ALOM score: 1.49 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2975 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25543 GB:AF109218 ThiM [Staphylococcus carnosus] 
Identities = 114/253 (45%) , Positives = 160/253 (63%) , Gaps = 1/253 (0%) 



Query: 


18 


LEQLKEWPLTICITNNWKNFTANGLLALGASPAMSECIEDLEDLLKVADALLINIGTL 


77 






L+Q++ +PL IC TN+WKNFTANGLL+LGASP MSE ++ ED VA ++LINIGTL 




Sb j ct : 


5 


LDQIRTEHPLVICYTNDWKNFTANGLLSLGASPTMSEAPQEAEDFYPVAGSVLINIGTL 


64 


Query: 


78 


TKESWQLYQEAIKIANKNQVPWLDPVAAGASRFRLEVSLDLLKNYSISLLTGNGSEIAA 


137 






TK E KIAN+ + P+V DPVA GAS++R + LK +++ GN SEI A 




Sbjct: 


65 


TKHHEHAMLENAKIANETETPLVFDPVAVGAS KYRKDFCKYFLKKI KPTVT KGNASE I LA 


124 


Query: 


138 


LIGEKQASKGADGGKVADLESIAVKANQVFDVPVVVTGETDAIAVRGEVRLLQNGSPLMP 


197 






LI + KG D D+ IA KA + + +++TGETD I +V L NGS + 




Sb j ct : 


125 


LIDDTATMKGTDSADNLDVVDIAEKAYKEYQTAIILTGETDVIVQDNKVVKLSNGSHFLA 


184 


Query: 


198 


LVTGTGCLLGAVLAAFIGSSDRSDDLACLTEAMTV^NVAGEIAEKVAKGKGVGSFQVAFL 


257 






+TG GCLLGAV+ AF+ + + L EA++VYN+A E AE+++ KG G+F F+ 




Sb j ct : 


185 


KITGAGCLLGAWGAFL-FRNTHPSIETLIEAVSVYNIAAERAEQLSDSKGPGTFLTQFI 


243 


Query: 


258 


DALSQMKSEMIMD 270 








DAL ++ S+ + + 




Sb j ct : 


244 


DALYRIDSDAVAE 256 





No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8806 (GBS398) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 6; MW 31.8kDa). 

The GBS398-His fusion product was purified (Figure 214, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 314), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1435 

A DNA sequence (GBSxl521) was identified in S.agalactiae <SEQ ID 4405> which encodes the amino 
acid sequence <SEQ ID 4406>. This protein is predicted to be ThiD (thiD). Analysis of this protein 
sequence reveals the following: 

Possible site: 44 

>>> Seems to have an uncleavable N-term signal seq 



Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



WO 02/34771 



PCT/GB01/04789 



-1584- 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25542 GB:AF109218 ThiD [Staphylococcus carnosus] 
Identities = 139/258 (53%) , Positives = 186/258 (71%) , Gaps = 4/258 (1%) 

LTIAGTDPSGGAGIMADLKTFQARRTYGMAVVTS WAQNTCG VRGVQHIETAI IDQQLAC 6 7 
LTIAGTDP+GGAG+MADLK+F A YGMA +TS+VAQNT GV+ + +++ + +QL 
LTIAGTDPTGGAGVMADLKSFHACGWGMAAITSIVAQNTKGVQHIHNLDITWLKEQLDS 67 



10 ++DD P+A+KTGM+A +E + L+ SYL+KYP PYV+DPVM+A SG L+D AL+ 



Query: 


8 


Sb j ct : 


8 


Query : 


68 


Sb j ct : 


68 


Query: 


127 


Sb j ct : 


128 


Query: 


185 


Sb j ct : 


188 


Query: 


245 


Sb j ct : 


247 



E LLPLA + TPNLPEAE +VG+ L E +1 KAG + + V+IKGGH++ +A 



KDYLF K+GL ++R +T HTHGTGCTF+AV+ AELAKG++I AV AK FI +1 



15 



20 

Query: 245 

+ PE+G G GPVNH +Y 
KYTPEIGQGRGPVNHFAY 

25 There is also homology to SEQ ID 4408. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1436 

A DNA sequence (GBSxl522) was identified in S.agalactiae <SEQ ID 4409> which encodes the amino 
30 acid sequence <SEQ ID 4410>. This protein is predicted to be TenA (tenA). Analysis of this protein 
sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 .2242 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25541 GB:AF109218 TenA [Staphylococcus carnosus] 
Identities = 78/213 (36%) , Positives = 127/213 (59%) , Gaps = 6/213 (2%) 

IQSIYQDPFIQGIIKGRLDHDVICHYLQADNIYLGKFADIYALCLAKSDNLRDKQFFLEQ 73 
45 I IYQD FIQ ++KG + + + YL+AD YL +FA+IYAL + +L +F ++Q 

IDEIYQDHFIQELLKGDIKKEALRQYLRADASYLREFANIYALLIPIMPDLESVRFLVDQ 74 

IDFTLNRELADGEGPHQAIAAYTNRSYQDIIEKGVWYPSADHYIKHMYFHFY-ENGIAGA 132 
I F +N E+ H+ +A Y +Y +I++K VW PS DHYIKHMY++ Y A A 

50 Sbjct: 75 IQFIVNGEVE AHEYMADYIGENYNEIVQKKVWPPSGDHYIKHMYYNVYAHENAAYA 130 



55 



Query: 


14 


Sb j ct : 


15 


Query: 


74 


Sb j ct : 


75 


Query: 


133 


Sb j ct : 


131 


Query: 


193 


Sb j ct : 


190 



+AAM+PCP++Y +AK+ +++ + W FY N ++ L+E +M+ N+ 



S+ ++ ++ + +++S HE FF MA EKW+ 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1437 

5 A DNA sequence (GBSxl523) was identified in S.agalactiae <SEQ ID 441 1> which encodes the amino 
acid sequence <SEQ ID 4412>. Analysis of this protein sequence reveals the following: 



35 



Possible site: 35 

>» Seems to have a cleavable N-terra signal seq. 



INTEGRAL 


Likelihood = 


-7. 


,06 


Transmembrane 


43 


- 59 


( 


36 


- 63) 


INTEGRAL 


Likelihood = 


-2. 


.55 


Transmembrane 


92 


- 108 


( 


92 


- 112) 


INTEGRAL 


Likelihood = 


-1 


,49 


Transmembrane 


135 


- 151 


( 


135 


- 151) 


INTEGRAL 


Likelihood = 


-1. 


,06 


Transmembrane 


69 


- 85 


{ 


69 


- 85) 


INTEGRAL 


Likelihood = 


-0. 


.22 


Transmembrane 


216 


- 232 


( 


216 


- 232) 



10 



15 Final Results 

bacterial membrane Certainty=0 . 3824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91230 GB:Z56283 orf2 [Lactobacillus helveticus] 
Identities = 46/215 (21%), Positives = 96/215 (44%), Gaps = 3/215 (1%) 

Query: 21 AITFLCLLIPTFSFSFTLRLRTSLLFLIIVVTLQCFvKVSLKTWAKVNLISFVMGLSLFL 80 
25 ++ F+ I + S L T+L+ + + ++ +K + + F+ ++F 

Sbjct: 4 SLKFILAFIISLEISLKASLTTNLIVIAFAIiIYLLVTRIKIKELILLIAVPFIASFTIFA 63 

Query: 81 GTYFWGKLPHQFVLASLVACRPLI FMNVGLLFHASHSNYDFIESLYQTFKVPSHFAYGI F 140 
+++ P + +L + R ++ + + DF SL Q +PS FAYG+ 

30 Sbjct: 64 TLFWFSPTPDAYYAra^-STRVYWTLTIACTTRNTTATDFARSLEQNLHLPSKFAYGVL 122 

Query: 141 AVFNLLPLIKLQYQRNRLAFRLKNQVTWALSPRLILSVLLKTIYWVEQLELAMLSKGFEV 200 

A N++P +K ++ R + ++ SP L +L + + L M S G+ 

Sbjct: 123 AAINIIPRMKTAVKQIRTSAMMRGMYLSFWSPVLYFKAILVALNSADNLAQGMESHGYVE 182 



Query: 201 GKERTHASTYPVRFRDYSL - LGMS I LLS IGM - 1 FK 233 

G++R P+ +D+ + + IL++I + IFK 

Sbjct: 183 GQKRATI VAI PLTKKDWLI FFTLL I LVNI SLFI FK 217 



40 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8807> and protein <SEQ ID 8808> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: 4.50 
45 GvH: Signal Score (-7.5): -0.2 

Possible site: 35 
»> Seems to have a cleavable N-term signal seq. 



50 



55 modified ALOM score: 1.91 



>M program 


count: 5 value: 


-7. 


.06 threshold: 


0.0 










INTEGRAL 


Likelihood 




-7. 


.06 


Transmembrane 


43 - 


59 


( 


36 - 


63) 


INTEGRAL 


Likelihood 




-2. 


.55 


Transmembrane 


92 - 


108 


( 


92 - 


112) 


INTEGRAL 


Likelihood 




-1. 


.49 


Transmembrane 


135 - 


151 


( 


135 - 


151) 


INTEGRAL 


Likelihood 




-1. 


.06 


Transmembrane 


69 - 


85 


( 


69 - 


85) 


INTEGRAL 


Likelihood 




-0. 


.22 


Transmembrane 


216 - 


232 


( 


216 - 


232) 


PERIPHERAL 


Likelihood 




2. 


,65 


170 













*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0 . 3824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1438 

A DNA sequence (GBSxl524) was identified in S.agalactiae <SEQ ID 4413> which encodes the amino 
acid sequence <SEQ ID 441 4>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> , Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3007 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91229 GB:Z56283 orfl [Lactobacillus helveticus] 
Identities = 123/424 (29%) , Positives = 200/424 (47%) , Gaps = 48/424 (11%) 

LFDEVTFSLNPGERILISGYSGCGKSTLALLLSGL- -KESGK- -GQVLLNGSLIEPSDVG 72 
L +++ ++ PG +LI G +GCGKSTL +++GL K +GK G++ L+G 
LINQLNMNIAPGFNLLI-GPTGCGKSTLIiKIlAGLYPKYAGKLTGKIDLHGQ KAA 65 

FLFQNPDLQFCMDTVAHELYFILENLQIEPEQMQDRSEFVLAQVGLKGFQNRLIYTLSQG 132 

+FQN QF M T E+ F LENLQI+ + + + + ++ I TLS G 

MMFQNAAEQFTMTTPREEIIFALENLQIKAKDYDLHIKKAVEFTKIADLLDQKINTLSGG 125 

EKQRLAIATIFLKSPKLIILDEAFANLDQESASQLLQLVLNYQANNQSMLIVIDHLITYY 192 
++Q +ALA + + +LDE FA+ D + L++ + + ++ +1+ DH++ Y 

QQQHVALAVLIAMDVDVFLLDEPFASCDPNTRHFLIEKLASLAETGRT-IILSDHVLDDY 184 

QDIMDHYFWLEKRLTRVNFDYMLNRLNVFELEKKSHN TGDKLLSIKDFQVK- 243 

+ I DH + E + + N+L F+ K+ H TG + + Q+K 

EKI CDHLYQFEGKTVKELSANEKNKL - - FKQNKQFHEQSYSFALPTGTPVFELNKTQIKQ 242 

LSKNKFI SYLDFDLASGERLCLDGPSGVGKSSLFMGLLGLYRTKGK KQ 291 

L +NK Y G+ + G +GVGK+SLF + + KG + 

NRLLLKQNKLKIY GKTTLITGSNGVGKTSLFKAMTKMIPYKGNFTYLDNEISK 295 

FTHRKQIP-ISFLFQNPLDQFIFSTVYDEIFQVCKDSN KARDILETINLWDKKQ 344 

+RK + 1+ FQ DQF+ TV DEI KD N K + LE + L 



+ LS GQQ++L I +L + +LL+DEP G D +++ L+ 



SH 
ISH 419 

Identities = 44/185 (23%) , Positives = 83/185 (44%) , Gaps = 24/185 (12%) 

Query: 28 GERILISGYSGCGKSTLALLLSGLKESGKGQVLLNGSLIEP SDVGFLFQNPDLQ 81 

G+ LI+G +G GK++L ++ + L+ + + S + FQ Q 

Sbjct: 256 GKTTLITGSNGVGKTSLFKAMTKMIPYKGNFTYLDNEISKIKYRKYLSQIAQFFQKASDQ 315 

Query: 82 FCMDTVAHELYFILENLQIEPEQMQDRSEFV IAQVGLKGFQNRLIYTLSQGE 133 

F TV E+ +DR+ F L ++ LK ++++Y+LS G+ 



Query: 


17 


Sbjct: 


12 


Query: 


73 


Sb j ct : 


66 


Query: 


133 


Sb j ct : 


126 


Query: 


193 


Sb j ct : 


185 


Query: 


244 


Sb j ct : 


243 


Query: 


292 


Sb j ct : 


296 


Query: 


345 


Sbjct: 


356 


Query: 


405 


Sb j ct : 


416 
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Sbjct: 316 FLTVTVKDEIEL SKKDRNNFFTDAKIDEWLEKLQLKQHLDQWYSLSGGQ 365 

Query: 134 KQRLALATIFLKSPKLlILDEAFANLDQESASQLLQLVIjRyQAffl^QSMLIVIDHLITYYQ 193 

+++L + + + ++++DE + LD ES +LQL+ Q Q ++I H I 
Sbjct: 366 QKZLQILLMLMTKHmniiLIDEPLSGLDHESVDLVLQLMQECQEKLQQTFLIISHQIDALA 425 

Query: 194 DIMDH 198 

D D+ 
Sbjct: 426 DFCDY 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 441 5> which encodes the amino acid 
sequence <SEQ ID 4416>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 120/455 (26%) , Positives = 203/455 (44%) , Gaps = 47/455 (10%) 



Query: 


1 


MLSVEKLACTHGDSHYLFDEV- TFSLNPGERILISGYSGCGKSTLALLLSGLKE SGK 


56 






M+S E+L T+ D ++ T + G+ I++ G SG GKST LL+G+ +GK 




Sbjct: 


21 


MISAEQLVFTYHDQKWPACQISTCQIASGQFIVLOGPSGSGKSTFLKLLNGIIPDyYAGK 


80 


Query: 


57 


GQVLLNGSLIEPS- DVGFLFQNPDLQFCMDTVAHELYFILENLQIEPEQMQD 


107 






+ L+ + + V +FQNP QF V HEL F EN ++ + + 




Sb j ct : 


81 


YEGRLDVADCQAGRDSWTFSRSVASVFQNPASQFFYREVQHELVFPCENQGLDAKVIMK 


140 


Query: 


108 


RSEFVIjAO^LKGFQNRLIYTLSQGEKQRIiALATIFLKSPKLIILDEAFANLDQESASQL 


167 






R + N+ ++ LS G+KQR+A+AT ++ +++ DE ANLD + + 




Sbjct: 


141 


RLWTLAEDFAFAELLNKDMFGLSGGQKQRVAIATAIMQGTNIMLFDEPTANLDSAGIAAV 


200 


Query: 


168 


LQLVLNYQANNQSMLIVIDHLITYYQDIMDHYFW LEKRLTRVNF DY 


213 






+ +A ++ +IV +H + Y D+ D++F+ L +LT N D 




Sb j ct : 


201 


KAYLTQLKAAGKT - 1 IVAEHRLHYLMDLADNFFYFKNGRLTDKLTTQNLLALTDEQRQDM 


259 


Query: 


214 


MLNRLNVFELE KKSHNTGDKLLS IKDFQVKLSKNKFI SYLDFDLASGERLCLD 


266 






L RL++ +L+ + H D L 1+ V+ A G + 




Sbjct: 


260 


GLRRLDLSDLKPVLAGKIESQHYRPDDSLCIEHLTVRAGSKILRCIEQLSFAVGSISGIT 


319 


Query: 


267 


GPSGVGKSSLFMGLLGLYRTKGKKQFTHRKQIPISFLFQNPLDQFIFSTVYDEIF--QVC 


324 






G +G+GKS L + G+ KK + IP+S + + V ++F V 




Sb j ct : 


320 


GSNGLGKSQLVYYIAGI--LDDKKATIKFQGIPLSAKQRLSKTSIVLQEVSLQLFAESVS 


377 


Query: 


325 


KDSN KARDILETINLWDKKQFSPFQLSQGQQRRLAIGSILASDSKLLLLDEPT 


377 






K+ N + +++E ,++L + P LS G+Q+R+ I + L +D +L+ DEP+ 




Sbjct: 


378 


KEVNLGHERHPRTTEVIERLSLTTLLERHPASLSGGEQQRVMIAASLLADKDILIFDEPS 


437 


Query: 


378 


YGQDAYHANMITTLLLSYCHKNHCGVIFTSHDPHL 412 








G D + LL+ H VI SHD L 




Sbj ct : 


438 


SGLDLLQMKALANLLMQ-LKTQHKWILISHDEEL 471 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1439 

A DNA sequence (GBSxl525) was identified in S.agalactiae <SEQ ID 4417> which encodes the amino 
acid sequence <SEQ ID 441 8>. Analysis of this protein sequence reveals the following: 



Certainty= 0.3 093 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 
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Possible site: 42 

>» Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-11.62 Transmembrane 8 - 24 ( 1-30) 
INTEGRAL Likelihood = -8.17 Transmembrane 145 - 161 ( 143 - 163) 
5 INTEGRAL Likelihood = -6.32 Transmembrane 66 - 82 ( 62 - 84) 

INTEGRAL Likelihood = -3.77 Transmembrane 112 - 128 ( 111 - 132) 
INTEGRAL Likelihood = -2.66 Transmembrane 43 - 59 ( 43 - 59) 

Final Results 

10 bacterial membrane Certainty=0. 5649 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB13180 GB:Z99110 ykoE [Bacillus subtilis] 

Identities = 68/177 (38%), Positives = 117/177 (65%), Gaps = 1/177 (0%) 

Query: 5 LKDVLLIALLAWLGVvYFGAGYISNAFVPFVGPIAHEVIYGIWFVAGPMALYILRKPGT 64 
+K++++++++++V WY + N GPIA+E IYGIWF+ +A Y++RKPG 

20 Sbjct: 6 VKEIVIMSVISIVFAWYLLFTHFGNVLAGMFGPIAYEPIYGIWFIVSVIAAYMIRKPGA 65 

Query: 65 AIVAELLAALIEVLIGSIYGPSVLVIGTLQGLGSELGFTLFRYHNYKLPAFILSAILTSI 124 

A+V+E++AAL+E L+G+ GP V+VIG +QGLG+E F R+ Y LP +L+ + +S+ 
Sbjct: 66 ALVSEIIAALVECLLGNPSGPMVIVIGIVQGLGAEAVFLATRWKAYSLPVLMLAGMGSSV 125 

25 

Query: 125 FSFAWS FYANGLSAFS FS YNI LML I VRTVS - S 1 1 FFLLTKNI CDQLHRSG VLNAYGI 180 

SF + + +G +A+S Y ++ML++R +S +++ LL K + L +GVLN + 
Sbjct: 126 ASFIYDLFVSGYAAYSPGYLLIMLVIRLISGALLAGLLGKAVSGSLAYTGVLNGMAL 182 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1440 

A DNA sequence (GBSxl526) was identified in S.agalactiae <SEQ ID 4419> which encodes the amino 
35 acid sequence <SEQ ID 4420>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

40 



INTEGRAL 


Likelihood 




-6. 


69 


Transmembrane 


65 


- 81 


( 


53 - 


95) 


INTEGRAL 


Likelihood 




-6. 


,37 


Transmembrane 


34 


- 50 


( 


31 - 


54) 


INTEGRAL 


Likelihood 




-6, 


.10 


Transmembrane 


176 


- 192 


( 


169 - 


195) 


INTEGRAL 


Likelihood 




-3. 


,66 


Transmembrane 


130 


- 146 


( 


130 - 


151) 


INTEGRAL 


Likelihood 




-1. 


.97 


Transmembrane 


3 


- 19 


( 


3 - 


19) 


INTEGRAL 


Likelihood 




-0. 


.90 


Transmembrane 


88 


- 104 


( 


88 - 


104) 



45 ' Final Results 

bacterial membrane Certainty=0. 3675 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID 9757> which encodes amino acid sequence <SEQ ID 9758> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8809> and protein <SEQ ID 8810> were also identified. Analysis of this 
55 protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -4.09 
GvH: Signal Score (-7.5): -4.38 

Possible site: 47 
»> Seems to have no N-terminal signal sequence 



M program 


count: 6 value: 


-6. 


69 threshold: 


0.0 








INTEGRAL 


Likelihood 


= -6 


.69 


Transmembrane 


65 - 


81 


( 53 - 


95) 


INTEGRAL 


Likelihood 


= -6 


.37 


Transmembrane 


34 - 


50 


( 31 - 


54) 


INTEGRAL 


Likelihood 


= -6 


.10 


Transmembrane 


176 - 


192 


( 169 - 


195) 


INTEGRAL 


Likelihood 


= -3 


.66 


Transmembrane 


130 - 


146 


( 130 - 


151) 


INTEGRAL 


Likelihood 


= -1 


.97 


Transmembrane 


3 - 


19 


( 3 - 


19) 


INTEGRAL 


Likelihood 


= -0 


.90 


Transmembrane 


88 - 


104 


( 88 - 


104) 


PERIPHERAL 


Likelihood 


= 5 


.30 


158 











modified ALOM score: 1.84 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 3675 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm 1 Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1441 

A DNA sequence (GBSxl527) was identified in S.agalactiae <SEQ ID 4421> which encodes the amino 
acid sequence <SEQ ID 4422>. Analysis of this protein sequence reveals the following: 
Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 881 1> and protein <SEQ ID 8812> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 6.01 
GvH: Signal Score (-7.5): 0.45 

Possible site: 23 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 10.66 threshold: 0.0 
PERIPHERAL Likelihood = 10.66 80 
modified ALOM score: -2.63 

*** Reasoning Step: 3 



Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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SEQ ID 4422 (GBS19) was expressed in E.coli as a ffis-tusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 4; MW 24kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 9 (lane 6; MW 46.1kDa). 

The GST-fusion protein was purified as shown in Figure 190, lane 10. 
Example 1442 

A DNA sequence (GBSxl528) was identified in S.agalactiae <SEQ ID 4423> which encodes the amino 
acid sequence <SEQ ID 4424>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8813> which encodes amino acid sequence <SEQ ID 8814> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop.- Possible site: -1 Crend: 6 
SRCFLG: 0 
20 McG: Length of UR: 23 

Peak Value of UR: 2.61 
Net Charge of CR: 3 
McG: Discrim Score: 9.08 
GvH: Signal Score (-7.5): -0.76 
25 Possible site: 22 

>» Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 23 
ALOM program count: 0 value: 5.14 threshold: 0.0 
PERIPHERAL Likelihood = 5.14 365 
30 modified ALOM score: -1.53 



*** Reasoning Step: 3 
Rule gpol 



Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA34476 GB:X16457 precursor polypeptide (AA -26 to 632) 
[Staphylococcus aureus] 
Identities = 93/372 (25%) , Positives = 160/372 (43%) , Gaps = 46/372 (12%) 

Query: 9 MKKQFLKSAAILSLAVTAVSTSQPVGAIVGKDETKLRQQLGYIDSKKSGKKIDERWGEKI 68 

MKKQ + A L++A + + AIV KD+K + + KG+ + +KI 
Sbjct: 1 MKKQIISLGA-LAVASSLFTWDNKADAIVTKDYSK ESRVNEKSKKGATVSDYYYWKI 56 

50 Query: 69 YNYLSYELIEANEWINRSEFQEPEYRTILSEFKDKIDSIEYYLINLS NIAKEDAHQ 124 

+ L + A + + ++ +P Y+ ++ + YL+ + K+ 

Sbjct: 57 IDSLEAQFTGAIDLLENYKYGDPIYKEAKDRLMTRVLGEDQYLLKKKIDEYELYKKWYKS 116 

Query: 125 RNILQSLDKYEKSGIYNLDQGVYNYIYQEISSAKHKFSDGVDKIYRLDSTLFPFSWYDK 184 
55 N ++ + K +YNL YN 1+ + A ++F+ V +1 + L F 

Sbjct: 117 SNKNTNMLTFHKYNLYNLTMNEYNDIFNSLKDAWQFNKEVKEIEHKNVDLKQF 170 



Query: 185 HLDNNDNYKDNKBFKEYIALLNEITRKARLGYQIVNNHKD-GEHKDEAEI-LDILIRDIT 242 
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D ++K KE L++EI Y KD GEH E LD+++ D 

Sbjct: 171 DKDGEDKATKEVYDLVSEIDTLVVTYYA DKDYGEHAKELRAKLDLILGDTD 221 

Query: 243 FVSKDAPGYKYI PNKRIAAKI IEDLDGI INDFFKNTGKDKP- SLEKLKDTEFHKKYLNST 301 
5 K I N+RI ++1+DL+ II+DFF T +++P S+ K T+ + K + 

Sbjct: 222 NPHK -ITNERIKKEMIDDLNSIIDDFFMETKQNRPHSITKYDPTKHNFKEKSEN 274 

Query: 302 EPYSIETNLPSNYKELKEKQIKKLEYGYK-KSSKIY--TSAHYALYSEEIDAAKELLQKV 358 
+P N +E K K +K+ + +K K+ K Y T + EE + L KV 

10 Sbjct: 275 KP NFDKLVEETK-KAVKEADESWKNKTVKKYEETVTKSPWKEEKKVEEPQLPKV 328 

Query: 359 KIAKDNYNEIKS 370 

N E+K+ 
Sbjct: 329 GNQQEVKT 336 

15 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8814 (GBS119) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 2; MW 84.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 35 (lane 5; 2 bands). 

20 The GBS119-GST fusion product was purified (Figure 109A; see also Figure 201, lane 6) and used to 
immunise mice (lane 1+2+3 product; 20ug/mouse). The resulting antiserum was used for Western blot, 
FACS (Figure 109B), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 1443 

A DNA sequence (GBSxl529) was identified in S.agalactiae <SEQ ID 4425> which encodes the amino 
acid sequence <SEQ ID 4426>. This protein is predicted to be s-adenosylmethionine synthetase (metK). 
Analysis of this protein sequence reveals the following: 

30 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3609 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < .suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07019 GB:AP001518 S-adenosylmethionine synthetase [Bacillus halodurans] 
40 Identities = 266/390 (68%), Positives =324/390 (82%), Gaps = 1/390 (0%) 

Query: 4 RKLFTSESVSEGHPDKIADQISDAILDAILEQDPDAHVAAETAVYTGSVHVFGEISTTAY 63 

R+LFTSESV+EGHPDKI DQISD+ILD IL++DP+A VA ET+V TG V V GEI+T+ Y 
Sbjct: 7 RRLFTSESVTEGHPDKICDQISDSILDEILKEDPNARVACETSVTTGLVLVAGEITTSTY 66 

45 

Query: 64 VDINRVVRNTIAEIGYDKAEYGFSAESVGVHPSLVEQSPDIAQGVNEALEVR-GSLEQDP 122 

VDI +WR+TI IGY +A+YGF +E+ V S+ EQSPDIAQGVN+ALE R G + 
Sbjct: 67 VDIPKVVRDTIRNIGYTRAKYGFDSETCAVLTSIDEQSPDIACGVNQALEAREGQMTDAE 126 

50 Query: 123 LDLIGAGDCGLMFGFAVDETPELMPLPISLAHQLVKKLTDLRKSGELTYLRPDAKSQVTV 182 

++ IGAGDQGLMFG+A +ETPELMPLPISL+H+L ++L++ RK L YLRPD K+QVTV 
Sbjct: 127 IEAIGAGDCGLMFGYANNETPELMPLPISLSHKLARRLSEARKGEILPYLRPDGKTQVTV 186 

Query: 183 EYDENDQPIRVDAWISTQHDPNVTNDQLHKDVIEKV1NEVIPSHYLDDQTKFFINPTGR 242 
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EYDENDQ +R+D +VISTQH P VT +Q+ D+ + VI V+P +D++TK+FINPTGR 



Sb j ct : 


187 


EYDENDQSVRIDTIVISTQHHPEVTLEQIESDLKQHVIRSWPEELIDEETKYFINPTGR 


246 




243 


FVIGGPOGDSGLTGRKIIVDTYGGYSRHGGGAF^OKTWTKVDRSASYAARYIAKNIVAAD 


302 






FVIGGPQGD+GLTGRKI IVDTYGGY+RHGGGAFSGKD TKVDRS +YAARY+AKNIVAA 




Sb j ct : 


247 


FVIGGPQGDAGLTGRKIIVDTYGGYARHGGGAFSGKDPTKVDRSGAYAARYVAKNIVAAG 


306 


Query: 


303 


LAKKVEVQLAYAIGVAQPVSWVDTFGTGVIAEADIiEAAVRQIFDLRPAGIINMLDLKRP 


362 






LA K EVQLAYAIGVA+PVS+ +DTFGTG ++EA L VR+ FDLRPAGII MLDL+RP 




Sbjct: 


307 


LADKCEVQLAYAIGVAKPVS I S I DTFGTGQVSEARLVELVREHFDLRPAGI I KMLDLRRP 


366 


Query : 


363 


IYRQTAAYGHMGRTDIDLPWERVDKVQALK 392 








IY+QTAAYGH GRTD++LPWE+ DK + L+ 




Sbjct: 


367 


IYKQTAAYGHFGRTDVELPWEQTDKAEILR 396 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4427> which encodes the amino acid 
sequence <SEQ ID 4428>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3389 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 333/395 (84%) , Positives = 361/395 (91%) , Gaps = 1/395 (0%) 



Query: 


1 


MSERKLFTSESVSEGHPDKIADQISDAILDAILEQDPDAHVAAETAVYTGSVHVFGEIST 


60 






MSERKLFTSESVSEGHPDKIADQISDAILDAIL +DP+AHVAAET VYTGSVHVFGEIST 




Sb j ct : 


1 


MSERKLFTSESVSEGHPDKIADQISrAILDAIIAEDPEAHTOAETCVYTGSVHVFGEIST 


60 


Query: 


61 


TAYvDINRvVRNTIAEIGYDKAEYGFSAKSVGVHPSLVEQSPDIAQGVNEALEVRGSLEQ 


120 






TAY+DINRWR+TIAEIGY +AEYGFSAESVGVHPSLVEQS DIAQGVNEA E R + 




Sbjct: 


61 


TAYIDINRWRDTIAEIGYTEAEYGFSAESVGVHPSLVEQSGDIAQGVNEAFESREG-DT 


119 


Query: 


121 


DPLDLIGAGDQGLMFGFAVDETPELMPLPISLAHQLVKKLTDLRKSGELTYLRPDAKSQV 


180 






D L IGAGDQGLMFGFA+ +ETPELMPLP I SL+HQLV+ +L +LRKSGE++YLRPDAKSQV 




Sb j ct : 


120 


DDLSHIGAGDQGLMFGFAINETPELMPLPISLSHQLVRRLAELRKSGEISYLRPDAKSQV 


179 


Query: 


181 


TWYDENDQPIRVDAWISTQHDPNVTNDQLHKDVIEKVINEVIPSHYLDDQTKFFINPT 


240 






TVEYDE+D+P+RVD WISTQHDP TNDQ+ +DVIEKVI VIP+ YLDD TKFFINPT 




Sbjct: 


180 


TVEYDEHDKPVRVDTWISTQHDPEATNDQIRQDVIEKVIKAVIPADYLDDDTKFFINPT 


239 


Query: 


241 


GRFVIGGPQGDSGLTGRKIIVDTYGGYSRHGGGAFSGKDATKVDRSASYAARYIAKNIVA 


300 






GRFVIGGPQGDSGLTGRKIIVDTYGGYSRHGGGAFSGKDATKVDRSASYAARYIAKN+VA 




Sb j ct : 


240 


GRFVIGGPQGDSGLTGRKIIVDTYGGYSRHGGGAFSGKDATKVDRSASYAARYIAKNLVA 


299 


Query: 


3 01 


ADIiAKKVEVQLAYAIGVAQPVSVRVDTFGTGVIAEADLEAAVRQIFDLRPAGIINMLDLK 


360 






A L K EVQLAYAIGVAQPVSVRVDTFGT + EA LEAAVRQ+ FDLRPAGII MLDLK 




Sbjct: 


300 


AGLVTKAEVQLAYAIGVAQPVSVRVDTFGTSTVPEAVLEAAVRQVFDLRPAGIIQMLDLK 


359 


Query: 


361 


RP I YRQTAAYGHMGRTDI DLPWERVDKVQALKDFI 395 








RPIY+QTAAYGHMGRTDIDLPWER++KV AL + + 




Sbjct: 


360 


RPIYKQTAAYGHMGRTDIDLPWERLNKVDALVEAV 394 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1444 

A DNA sequence (GBSxl530) was identified in S.agalactiae <SEQ ID 4429> which encodes the amino 
acid sequence <SEQ ID 443 0>. This protein is predicted to be a transcriptional repressor of the biotin 
operon. Analysis of this protein sequence reveals the following: 

5 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 188 - 204 ( 188 - 204) 

Final Results 

10 bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9755> which encodes amino acid sequence <SEQ ID 9756> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05404 GB:AP001512 transcriptional repressor of the biotin 
operon [Bacillus halodurans] 
Identities = 102/315 (32%) , Positives = 169/315 (53%) , Gaps = 18/315 (5%) 

20 

Query: 10 ILSKNNNFISGETMANQLNISRTAIWKGIKTLEELGLEIESVTNKGYRLVSG-DILLPEQ 68 

+L+ ++F+SGE ++ + SRTA+WK 1+ L + G E+E+V KGYR+V D + P 
Sbjct: 9 LLTAGDDFVSGEKISQAIGCSRTAVWKHIEELRKSGYEVEAVQRKGyRIVKRPDQIKPHD 68 

25 Query: 69 LE QEIGIKVSLNNNSASTQLDAKMGIESKLKTPHLFLAPNQKKAKGRFDRPFFTS 123 

++ + G +++ ++ASTQ A + K H+ LA Q KGR R +++ 

Sbjct: 69 IQVVLETERFGREITYLESTASTQTVALKIiAQEGAKEGHIVLANEQTSGKGRMGRGWYSP 128 

Query: 124 NQGGIYMSLLLQPNVPIEDIKPYTVMVASSAVKAISRLTGITPEIKWVNDIYLDNKKIAG 183 
30 I MS++ +P +P + T++ A + V+AI TG+ +IKW ND+ +D KKI G 

Sbjct: 129 PGSSISMSIIFRPQLPPQKAPQLTLLTAVAIVRAIKETTGLDSDIKWPNDLLIDGKKIVG 188 

Query: 184 ILTEAIASVESGLVTNVI IGLGINFYIKE - - FPRALTKRAGSLFTEQ- PTITRNQLITEI 240 
ILTE A +S V +VI G+GIN +E F + K A SL ++ I R LI I 
35 Sbjct: 189 ILTEMQADQDS- - VHSVIQGIGINVNHQEEAFAEEIRKIATSLAIKKGEPIQRAPLIAAI 246 

Query: 241 W NLFFNI PLEDHLK VYREKSLVLDRTVSFMDGQTMYSGKAIDITDKGYLWEL 293 

LF+++ L+ ++++ + + + 1- G A ITD G L++E 

Sbjct: 247 LKNIELFYDLYLQHGFSRIKPLWEAHAISIGKRIRARMLNDVKFGVAKGITDDGVLLLED 306 



40 



50 



55 



Query: 294 DDGQLKTLRSGE I SL 308 

DDG+L ++ S +1 + 
Sbjct: 307 DDGKLHSIYSADIEI 321 



45 A related DNA sequence was identified in S. pyogenes <SEQ ID 443 1> which encodes the amino acid 
sequence <SEQ ID 4432>. Analysis of this protein sequence reveals the following: 



Possible site: 34 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 194 - 210 ( 194 - 211) 



Final Results 

bacterial membrane Certainty=0. 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05404 GB:AP001512 transcriptional repressor of the biotin 
operon [Bacillus halodurans] 
Identities = 98/315 (31%) , Positives = 165/315 (52%) , Gaps = 18/315 (5%) 
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Query: 10 LLSQTDDFVSGEYLADQLSISRTSVWKSIKSLEMQGIQIDSLKHKGYRMVQG-DILIjPKT 68 

LL+ DDFVSGE ++ + SRT+VWK 1+ L G ++++++ KGYR+V+ D + P 
Sbjct: 9 LLTAGDDFVSGEKISQAIGCSRTAVWKHIEELRKSGYEVEAVQRKGYRIVKRPDQIKPHD 68 

5 

Query: 69 I SQ^LGMPVTYTPHSQSTQLDAKQGIEAHNSAPRLYLAPSQEAAKGRLDRQFFSA 123 

I ++ G +TY + STQ A + + + LA Q + KGR+ R ++S 

Sbjct: 69 IQVVLETERFGREITYLESTASTQTVALKIAQEGAKEGHIVLANEQTSGKGRMGRGWYSP 128 

10 Query: 124 STGGIYMSMYLKPWPYADMPPYTMWASSIVKAISRLTGIDTEIKmnTOIYLGNHKVAG 183 

I MS+ +P +P P T++ A +IV+AI TG+D++IKW ND+ + K+ G 
Sbjct: 129 PGSS ISMSI 1 FRPQLPPQKAPQLTLLTAVAIVRAIKETTGLDSDI KWPNDLLIDGKKIVG 188 

Query: 184 ILTEAITSVETGLITDVI IGVGLNFFVTD- -FPEAIAQKAGSLFTEK- PTITRNDLI IDI 240 
15 ILTE + + VI G+G+N +FEI+ASL+K I R LI I 

Sbjct: 189 ILTE--MQADQDSVHSVIQGIGINVNHQEEAFAEEIRKIATSLAIKKGEPIQRAPLIAAI 246 

Query: 241 WK LFLSIPVKDHVKVYKEKSLVLNKQVTFIENSQEKRftlAIDLTDQGHLIVQF 293 

K L+L +++ ++ + K++ + K +A +TD G L+++ 

20 Sbjct: 247 LKNIELFYDLYLQHGFSRIKPLWEAHAISIGKRIRARMLNDVKFGVAKGITDDGVLLLED 306 

Query: 294 ENGDLQTLRSGEISL 308 

++G L ++ S +1 + 
Sbjct: 307 DDGKLHS I YSADIEI 321 

25 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 191/311 (61%) , Positives = 257/311 (82%) 

Query: 1 MKTYEKIYQILSKNlWMFISGETMANQIjNISRTAIWKGIKTLEELGLEIESVTNKGYRIiVS 60 
30 MKT EKIYQ+LS+ ++F+SGE +A+QL+ISRT++WK IK+LE G++I+S+ +KGYR+V 

Sbjct: 1 MKTSEKIYQLLSQTDDFVSGEYIiADQLSISRTSVWKSIKSLENQGIQIDSLKHKGYRMVQ 60 

Query: 61 GDILLPEQLEQEIGIKVSLNNNSASTQLDAKMGIESKLKTPHLFIAPNQKKAKGR 120 
GDILLP+ + Q +G+ V+ +S STQLDAK GIE+ P L+LAP+Q+ AKGR DR F 
35 Sbjct: 61 GDILLPKTISQGLGMPVTYTPHSQSTQLDAKQGIEAHNSAPRLYLAPSQEAAKGRLDRQF 120 

Query: 121 FTSNQGGIYMSLLLQPJSTV'PIEDIKPYTVMVASSAVKAISRLTGITPEIKWVNDIYLDNKK 180 

F+++ GGIYMS+ L+PNVP D+ PYT+MVASS VKAISRLTGI EIKWVNDIYL N K 
Sbjct: 121 FSASTGGIYMSMYLKPNVPYADMPPYTMMVASSIVKAISRLTGIDTEIKWVNDIYLGNHK 180 

40 

Query: 181 IAGILTEAIASvESGLVTNVIIGLGINFYIKEFPRALTKRAGSLFTEQPTITRNQLITEI 240 

+AGILTEAI SVE+GL+T+VI IG+G+NF++ +FP A+ ++AGSLFTE+PTITRN LI +1 
Sbjct: 181 VAGILTEAITSVETGLITDVI IGVGLNFFVTDFPEAIAQKAGSLFTEKPTITRNDLI IDI 240 

45 Query: 241 WNLFFNIPLEDHLKVYREKSLvLDRTVSFMDGQTMYSGKAIDITDKGYLVVELDDGQLKT 300 

W LF +IP++DH+KVY+EKSLVL++ V+F++ AID+TD+G+L+V+ ++G L+T 

Sbjct: 241 WKLFLSIPVKDHVKVYKEKSLVLNKQVTFIENSQEKRAIAIDLTDQGHLIVQFENGDLQT 300 

Query: 301 LRSGEISLSSW 311 
50 LRSGEISLSSW 

Sbjct: 301 LRSGEISLSSW 311 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1445 

A DNA sequence (GBSxl531) was identified in S.agalactiae <SEQ ID 4433> which encodes the amino 
acid sequence <SEQ ID 4434>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have an uncleavable N-term signal seq 
60 INTEGRAL Likelihood = -2.76 Transmembrane 3 - 19 ( 3 - 20) 



Final Results 
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bacterial membrane Certainty=0. 2105 (Affirmative) < suco 

bacterial outside Certalnty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1446 

10 A DNA sequence (GBSxl532) was identified in S.agalactiae <SEQ ID 4435> which encodes the amino 
acid sequence <SEQ ID 443 6>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.28 Transmembrane 24 - 40 ( 24 - 40) 

15 

Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4437> which encodes the amino acid 
sequence <SEQ ID 443 8>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
25 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -1.91 Transmembrane 58 - 74 ( 58 - 75) 



Certainty=0. 1765 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/67 (55%) , Positives = 54/67 (80%) , Gaps = 3/67 (4%) 

35 

Query: 1 MTKRQFIFMMjLCSFETYFFNQSVMDGSWIFAIFWGvTiLLRDLQKVYAISKFTKELIK- - 58 

MT RQF+FMA +C+FETYFFN ++ G+++FA+FWG+LL RDL++V+ I++ TK ++K 
Sbjct: 36 MTIRQFLFMAFVCAFETYFFNDLLLSGNYLFALFWGLLLFRDLRRVHTINQLTKTILKTA 95 

40 Query: 59 -STKKKD 64 

S KKKD 
Sbjct: 96 NSPKKKD 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1447 

A DNA sequence (GBSxl533) was identified in S.agalactiae <SEQ ID 4439> which encodes the amino 
acid sequence <SEQ ID 4440>. This protein is predicted to be DNA polymerase III, gamma subunit 
(dnaZX). Analysis of this protein sequence reveals the following: 

50 Possible site: 60 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial membrane 

30 bacterial outside 

bacterial cytoplasm 
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Final Results 

bacterial cytoplasm Certainty=0 . 1567 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 444 1> which encodes the amino acid 
sequence <SEQ ID 4442>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 232 - 248 ( 232 - 249) 

Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 408/558 (73%) , Positives = 473/558 (84%) , Gaps = 6/558 (1%) 

Query: 1 MYQALYRKYRSQTFDEMVGQSVISTTLKQAVSSKKISHAYLFSGPRGTGKTSAAKIFAKA 60 

MYQALYRKYRSQTFDEMVGQSVISTTLKQAV S KI SHAYLFSGPRGTGKTSAAKI FAKA 
Sbjct: 1 MYQALYRKYRSQTFDEMVGQSVISTTLKQAVESGKISHAYLFSGPRGTGKTSAAKIFAKA 60 

Query: 61 MNCPNQINGEPCNHCDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 120 

MNCPNQ++GEPCN CDlGRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 
Sbjct: 61 MNCPNQVDGEPCNQCDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 120 

Query: 121 VYIIDEvHMLSTGAFNALLKTLEEPTENVWIIjATTEMKIPATILSRVQRFEFKAIKLL 180 

VYIIDEVHMLSTGAFNALLKTLEEPTENVVFILATTELHKIPATILSRVQRFEFKAIK 
Sbjct: 121 VYIIDEVHMLSTGAFNALLKTLEEPTENWFILATTELHKIPATILSRVQRFEFKAIKQK 180 

Query: 181 AIRDHIjAQILDKEAISYDLDALTLVaRRAEGGMRDALSILDQALSLAKDNHISLDVAEEI 240 

AIR+HLA +LDKE I+Y++DAL L+ARRAEGGMRDALSILDQALSL+ DN +++ +AEEI 
Sbjct: 181 AIREHLAWVLDKEGIAYEVDALNLIARRAEGGMRDALSILDQALSLSPDNQVAIAIAEEI 240 

Query: 241 TGSISLSAIDDYVSNILAHDTTEALAKLEVIFDSGKSMSRFATDLLMYLRDLLWQAGGE 300 

TGSIS+ A+ DYV + T+ALA LE I +DSGKSMSRFATDLL YLRDLLW+AGG+ 

Sbjct: 241 TGSISIIjALGDYvRWSQEQATQALAALETIYDSGKSMSRFATDLLTYLRDLLVVKAGGD 300 

Query: 301 DSHSSDTFIANIjNVKQDILFEMIDKVTSvLPEIKNGSHPKVYAEMMTIQLSEMVEKNSS- 359 
+ S F NL++ D +F+MI VTS LPEIK G+HP++YAEMMTIQL++ + S 
' Sbjct: 301 NQRQSAVFDTNLSLSIDRIFQMITVVTSHLPEIKKGTHPRIYAEMMTIQLAQKEQILSQV 360 

Query: 360 NIPADTCAELDSLRRELKSLKNEMSQL-SRADQSSSTQKVKVNNKTFTFJWDRTKILTIM 418 

N+ ++ +E+++L+ EL LK ++SQL SR D + + K K KT +++VDR IL IM 
Sbjct: 361 NLSGELISEIETLKNELAQLKQQLSQLQSRPDSLARSDKTK- -PKTTSYRVDRVTILKIM 418 

Query: 419 EETvATOSQRSREYLEALKSAWNEILDNITAQDRALLMGSEPVLANSENAIIAFDAAFNAE 478 

EETV +SQ+SR+YL+ALK+AWNEILDNI+AQDRALLMGSEPVLANSENAILAF+AAFNAE 
Sbjct: 419 EETVRNSQQSRQYLDALKNAWNEILDNISAQDRALIjMGSEPvLiANSENAILAFEAAFNAE 478 

Query: 479 QAMKRTDIJSTOIFGNIMSKAAGFSPNIIjAVPRNDFNQIRSDFAKKMKAQK- -TETEPEVNH 536 

Q M R +LND+FGNIMSKAAGFSPNILAVPR DF IR +FA++MK+QK + E EV 
Sbjct: 479 QVMSRNNLNDMFGNIMSKAAGFSPNILAVPRTDFQHIRKEFAQQMKSQKDSVQEEQEVAL 538 

Query: 537 QIPEDFSYLAERIAIVED 554 

IPE F +L ++I ++D 
Sbjct: 539 DIPEGFDFLLDKINTIDD 556 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1448 

A DNA sequence (GBSxl534) was identified in S.agalactiae <SEQ ID 4443> which encodes the amino 
acid sequence <SEQ ID 4444>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 >>> Seems to have no N-terminal signal sequence (or aa 1-19) 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



15 



>GP:BAB06927 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 67/143 (46%) , Positives = 96/143 (66%) 

' Query: 8 ENYQLLLLQAQALFSDETNALflNLSNASAMLNAMLPNSVFTGFYLFDGEELILGPFQGGV 67 
E Y L+ Q AL E++A+ANL+NASA+L I, + GFYL EL+LGPFQG 
Sbjct: 13 EK^SLOTKQLAALLEGESDAIANLANASALLYHFLEEVNWVGFYLIKEGELVLGPFQGLP 72 

20 Query: 68 SCVHITLGKGVCGESAQTAKTLI VDDvTKHANYISCDSKAMSEIVVPMFKNGKLLGVLDL 127 

+CV I +G+GVCG +A+ +T+ V+DV + +I+CD+ + SEIV+P+F+NG L GVLD+ 
Sbjct: 73 ACVRIPIGRGVCGTAAKEEQTVRVEDVHQFPGHIACDAASRSEI VIPLFQNGVLYGVLDI 132 

Query: 128 DSSLVADYDEIDQEYLEKFVGIL 150 
25 DS + + E +Q LE FV +L 

Sbjct: 133 DSPSLNRFSEEEQALLESFVDVL 155 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4445> which encodes the amino acid 

sequence <SEQ ID 4446>. Analysis of this protein sequence reveals the following: 

30 Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1753 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/164 (74%) , Positives = 144/164 (87%) 

40 

Query: 1 MNKSKKIENYQLLLLQAQALFSDETNALANLSNASAMIiNAMLPNSVFTGFYLFDGEELIL 60 

MNKSKKIE YQL++ QA+ LF++E+NALANLSNASA+I»N LPNSVFTGFYLFDG+ELIL 
Sbjct: 1 MNKSKKIEQYQLMIAQAKELFANESNALANLSNASALLNMTLPNSVFTGFYLFDGQELIL 60 

45 Query: 61 GPFQGGVSCTHITLGKGVCGESAQTAKTLITODVTKHANYISCDSKAMSEIWPMFKNGK 120 

GPFQG VSCVHI LGKGVCGESAQ+ +T+I++DV +HANYISCD+ AMSEIWPM KG 
Sbjct: 61 GPFQGRVSCTHIKLGKGVCGESAQSRRTIIINDVKQHANYISCDAAAMSEIVVPMVKEGH 120 

Query: 121 LLGVLDLDSSLVADYDEIDQEYLEKFVGILVEHTIWNLDMFGVE 164 
50 L+GVLDLDSSLVADYDE+DQEYLE FV + +E T + +MFGV+ 

Sbjct: 121 LIGVLDLDSSLVADYDEVDQEYLEAFVDLFLEKTTFTFNMFGVK 164 

SEQ ID 4444 (GBS282) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 9; MW 19.8kDa). It was also expressed in E.coli as a GST-fusion 
55 product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 6; MW 44.8kDa) and in Figure 
63 (lane 7; MW 47kDa). 
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The GBS282-GST fusion product was purified (Figure 211, lane 4; see also Figure 225, lane 6) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 269), which confirmed that the protein 
is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1449 

A DNA sequence (GBSxl535) was identified in S.agalactiae <SEQ ID 4447> which encodes the amino 
acid sequence <SEQ ID 4448>. This protein is predicted to be uridine kinase (udk). Analysis of this protein 
sequence reveals the following: 

Possible site: 24 

»> Seems to have an uncleavable N-term signal seq 

_____ FxrL_.X Results — 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14675 GB:Z99117 uridine kinase [Bacillus subtilis] 
Identities = 133/207 (64%) , Positives = 167/207 (80%) 

Query 1 MRKCPIIIGTOGGSGGGKTSVSRAILSNFPDQKITMIEHDSYYKDQSHLTFEERVKENYD 60 

M K P++IG+ GGSG GKTSV+R+I F I MI+ D YYKDQSHL FEER+ TNYD 
Sbjct: 1 MGKNPWIGIAGGSGSGKTSVTRSIYEQFKGHSILMIMDLYYKTOSHLPFEERLNTNYD 60 

Query 61 HPLRFDTNLMIEQLNELIEGRPVDIPVYDYTI«TRSDRTIRQEPQDVII VEGILVLEDQR 120 

HPLAFD + +IE + +L+ RP++ P+YDY HTRS+ T+ EP+DVII+EGILVLED+R 
Sbjct: 61 HPIAFDNDYLIEHIQDLLOTRPIEKPIYDYKLHTRSEETVHVEPKDVIILEGILVLEDKR 120 

Query 121 LRDLMDIKLFVDTDDDIRIIRRIKRDMEERDRSLDSIIEQYTEWKPMYHQFIEPTKRYA 180 

LRDLMDIKL4-VDTD D+RIIRR1 RD+ ER RS+DS+IEQY W+PM++QF+EPTKRYA 
Sbjct: 121 LRDLMD I KLYVDTDADLRI I RRIMRD INERGRS I DSVIEQYVSWRPMHNQFVEPTKRYA 180 

Query: 181 DIVIPEGVSNIVAIDLINTKVASILNE 207 

DI + IPEG N VAIDL+ TK+ +IL + 
Sbjct: 181 DIIIPEGGQNHVAIDLMVTKIQTILEQ • 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4449> which encodes the amino acid 
sequence <SEQ ID 4450>. Analysis of this protein sequence reveals the following: 

■ Possible site: 39 
»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9151> which encodes the amino acid sequence 
<SEQ ID 9152>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/207 (83%) , Positives = 193/207 (92%) 

5 





Ouerv : 


1 


MRKKPIIIGVTGGSGGGKTSVSRAILSNFPDQKITMIEHDSYYKDQSHLTFEERVKTNYD 


60 








M KKPI I IG VTGGSGGGKTSVSRAIL +FP+ +1 MI+HDSYYKDQSH++FEERVKTNYD 






Sb j ct : 


5 


MLKKPI I IGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYD 


64 


10 


Query: 


61 


HPLAFDTNLMIEQLNELIEGRPvDIPVyDYTKHTRSDRTIRQEPQDVII VEGILVLEDQR 


120 








HPLAFDT+ MI+QL EL+ GRPVDI P+YDY KHTRS+ T RQ+PQDVII VEGILVLED+R 






Sb j ct : 


65 


HPLAFDTDFMIQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDER 


124 




Query: 


121 


LRDLMDIKLFVDTDDDIRIIRRIKRDMEERDRSLDSIIEQYTEWKPMYHQFIEPTKRYA 


180 


15 






LRDLMDIKLFVDTDDDIRIIRRIKRDM ER RSL+SII+QYT WKPMYHQFIEP+KRYA 






Sb j ct : 


125 


LRDLMDIKLFVDTDDDIRIIRRIKRDMMERGRSLESIIDQYTSWKPMYHQFIEPSKRYA 


184 




Query: 


181 


DI VI PEGVSNI VAIDL INTKVAS I LNE 207 










DIVIPEGVSN+VAID+IN+K+ASIL E 




20 


Sbjct: 


185 


DIVIPEGVSNWAIDVINSKIASILGE 211 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1450 

25 A DNA sequence (GBSxl536) was identified in S.agalactiae <SEQ ID 445 1> which encodes the amino 
acid sequence <SEQ ID 4452>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

30 - Final Results 

bacterial cytoplasm Certainty=0 . 5083 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12572 GB:Z99108 similar to RNA helicase [Bacillus subtilis] 
Identities = 140/343 (40%) , Positives = 202/343 (58%) , Gaps = 9/343 (2%) 



Query: 10 QDKLTQRQFDDLTDIQNKLFQPITDGDNILGISPTGTGKTLAYLFPTLLKLQPK-KSQQL 68 
40 Q+ F T +Q + Q I DG +++ SPTGTGKTLAY P L +++P+ K Q 

Sbjct: 16 QENWNASGFQKPTPVQEQAAQLIMDGKDVIAESPTGTGKTLAYALPVLERIKPEQKHPQA 75 

Query: 69 LILAPNSELAGQIFDVTKEWAEPLGLTAQLFLSGSSQKRQIERLKKGPEILIGTAGRVFE 128 
+IIAP+ EL QIF V ++W LA + G++ K+Q+E+LKK P I++GT GRVFE 

45 Sbjct: 76 VILA.PSRELVMQIFQVIQDWKAGSELRAASLIGGANVKKQVEKLKKHPHIIVGTPGRVFE 135 

Query: 129 LVKLKKIHWINTIVLDEFDELLGDSQYHFVDNIINRVPRDQQMIYISATNKLDNS 185 

L+K KK+KM + TIVLDE D+L+ + II RD+Q++ SAT K + 

Sbjct: 136 LIKAKKLKMHEVKTIVLDETDQLVLPEHRETMKQIIKTTLRDRQLLCFSATLKKETEDVL 195 

50 

Query: 186 -KLADNTITIDLSNQKLDT--IKHYYITVDKRERTDLLRKFSNIPDFRGLVFFNSLSDLG 242 

+LA. + + K + +KH Y+ D+R++ LL+K S + + LVF + +L 

Sbjct: 196 REIAQEPEVLKVQRSKAEAGKVKHQYLICDQRDKVKLLQKLSRLEGMQALVFVRDIGNLS 255 

55 Query: 243 ACEERLQFNRASAVSLASDINIKFRKVILEKFKNHDISLLLGTDLVARGIDIDNLEYVIN 302 

E+L ++ L S+ R 1+ F++ + LLL TD+ ARG+DI+NL YVI+ 

Sbjct: 256 VYAEKLAYHHvELGvLHSEAKKMERAKIIATFEDGEFPLLLATDIAARGLDIENLPYVIH 315 

Query: 303 FDIARDKETYTHRSGRTGRMGKEGCVITFVTHKEELKQLKKYA 345 
60 DI D++ Y HRSGRTGR GKEG V++ VT EE K LKK A 

Sbjct: 316 ADI P - DEDGYVHRSGRTGRAGKEGNVLSL VTKLEESK-LKKMA 356 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4453> which encodes the amino acid 
sequence <SEQ ID 4454>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 84 7 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 273/358 (76%) , Positives = 312/358 (86%) 



Query: 


1 


MITKFPDQWQDKLTQRQFDDLTDIQNKLFQPITDGDNILGISPTGTGKTIAYLFPTLLKL 


60 






MITKFP QWQ+KL Q F LT IQ + FQPI DG N LGISPTGTGKTIAY+FP LL L 




Sb j ct : 


12 


MITKFPPQWQEKLDQVAFTHLTPIQEQAFQPIVDGKNFLGISPTGTGKTLAYVFPNLLAL 


71 


Query: 


61 


QPKKSQQLLIIAPNSELAGQIFDVTKEWAEPLGLTAQLFLSGSSQKRQIERLKKGPEILI 


120 






PKKSQQLLIIAPN+ELAGQIF+VTK+WA+PLGLTAQLF+SG+SQKRQIERLKKGPEILI 




Sb j ct : 


72 


TPKKSQQLLILAPNTELAGQIFEVTKDWAQPLGLTAQLFISGTSQKRQIERLKKGPEILI 


131 


Query: 


121 


GTAGRVFELVKLKKIKMMNINTIVLDEFDELLGDSQYHFVDNIINRVPRDQQMIYISATN 


180 






GT GR+FEL+KLKKIIO^M++NTIVLDE+DELLGDSQY FV I + VPRD QM+Y+SATN 




Sb j ct : 


132 


GTPGRIFELIKLKKIKMMSVNTIVLDEYDELLGDSQYDFVQK1SHYVPRDHQ^WYMSATN 


191 


Query: 


181 


KLDNSKIADNTITIDLSNQKLDTIKHYYITVDKRERTDLLRKFSNIPDFRGLVFFNSLSD 


240 






K+D + LA NT IDLS Q D I+H+Y+ VDKRERTDLLRKF+NIP FR LVFFNSLSD 




Sb j ct : 


192 


KOTQTSIAPNTFCIDLSEQTNDAIQHFYLMVDKRERTDLLRKFTNIPHFRALVFFNSLSD 


251 


Query: 


241 


LGACEERLQFNRASAVSLASDINIKFRKVILEKFKNHDISLLLGTDLVARGIDIDNLEYV 


300 






LGA EERLQ+N A+AVSIASDIN+KFRK ILEKFK+H +SLLL TDLVARGIDIDNL+YV 




Sb j ct : 


252 


LGATEERLQYNGAAAVSLASDIiSIVKFRKTILEKFKSHQLSLLLATDLVARGIDIDNLDYV 


311 


Query: 


301 


INFDIARDKETYTHRSGRTGRMGKEGCVITFOTHKEELKQLKKYATVTELVLHNQKLH 358 






I+FD+ARDKE YTHR+GRTGRMGK G VITFV+H E+LK+LKK+A V+E+ L NQ+LH 




Sb j ct : 


312 


IHFDVARDKENYTHRAGRTGRMGKSGIVITFVSHPEDLKKLKKFAKVSEISLKNQQLH 369 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1451 

A DNA sequence (GBSxl537) was identified in S.agalactiae <SEQ ID 4455> which encodes the amino 
acid sequence <SEQ ID 445 6>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.38 Transmembrane 15 - 31 ( 13 - 31) 

Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1452 

A DNA sequence (GBSxl538) was identified in S.agalactiae <SEQ ID 4457> which encodes the amino 
acid sequence <SEQ ID 445 8>. This protein is predicted to be peptidoglycan GlcNAc deacetylase. Analysis 
of this protein sequence reveals the following: 

5 Possible site: 28 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.92 Transmembrane 4- 20 ( 1 - 26) 

Final Results 

10 bacterial membrane Certainty=0 .4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB96552 GB:AJ251472 peptidoglycan GlcNAc deacetylase 

[Streptococcus pneumoniae] 
Identities = 133/431 (30%), Positives = 228/431 (52%), Gaps = 20/431' (4%) 

Query: 5 IIGIFSLIIIAILAWQGFSFLKHK- -EIKLQQAWEKEIRIABKTVEVVKRQKTERVLFL 62 
20 +IGI ++ I + + F + K E K++ EK+ +++E + RQ V+ 

Sbjct: 21 LIGILAISICLLGGFIAFKIYQQKSFEQKIESLKKEKDDQLSEGNQKEHFRQGQAEVIAY 80 

Query: 63 EPKGYDKSLSADILKWNQKSFEHKKFYDNQYIILRPQLADSNFANVKKLSIYQILYQKEK 122 
P +K +S+ NQ + + DN Q +S V ++ + +Y 

25 Sbjct: 81 YPLQGEKVISSVRELINQDVKDKLESKDNLVFYYTEQ-EESGLKGWNRNVTKQIYDLVA 139 

Query: 123 GSMFQKSSRLLRTYLLDQNKKPFELDELLAHNISGFKAILENIAPGTQLK--EHDSNKEF 180 

+ + L L ++ +PF LD+L + + +++ + + K ED +++ 

Sbjct: 140 FKIEETEKTSLGKVHLTEDGQPFTLDQLFSDASKAKEQLIKELTSFIEDKKIEQDQSEQI 199 

30 

Query: 181 LKTGRVTD GLDVKDGKLII NDLKLPLDKLYNVIDESYLKSSDLDLVS 227 

+K D D KD ++I+ ++ LP+ ++VI SYL D L 

Sbjct: 200 VKNFSDQDLSAWNFDYKDSQIILYPSPWENLEEIALPVSAFFDVIQSSYLLEKDAALYQ 259 

35 Query: 228 NLKAKAPR--VALTFDDGPNEKTTPKALEILKRYNAKATFFVMGQSAVGHTDILQRMHAE 285 

+ K + VALTFDDGPN TTP+ LE L +Y+ KATFFV+G++ G+ D+++R+ +E 
Sbjct: 260 SYFDKKHQKVVALTFDDGPNPATTPQVLETLAKYDIKATFFVLGKNVSGNEDLVKRIKSE 319 

Query: 286 GHEIGIffiTWDHPNLTKIiPAEKIKEEIHKTNDLIMKATGQKPVYLRPPYGATNAWKTVTG 345 
40 GH +GNH+W HP L++L ++ K++I T D++ K G +RPPYGA ++ 

Sbjct: 320 GHWGNHSWSHPILSQLSLDEAKKQITDTEDVLTKVLGSSSKLMRPPYGAITDDIRNSLD 379 

Query: 346 LKEMLWSv^JTEDWKNHOTQA^MTNIKKQLRPGGVILMHDIHQTTIDALPTIMDYLTTQGY 405 
L ++W VD+ DWK+ N +++T 1+ Q+ G ++LMHDIH T++ALP +++YL QGY 
45 ' Sbjct: 380 LSFIMWDVDSLDWKSKNEASILTEIQHQVANGSIVLMHDIHSPTVNALPRVIEYLKNQGY 439 

Query: 406 YFVTVGELYST 416 

FVT+ E+ +T 
Sbjct: 440 TFVTI PEMLNT 450 

50 

A related DNA sequence was identified in S. pyogenes <SEQ ID 445 9> which encodes the amino acid 
sequence <SEQ ID 4460>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
>>> Seems to have an uncleavable N-term signal seq 
55 INTEGRAL Likelihood =-12.58 Transmembrane 6 - 22 ( 1 - 27) 

Final Results 

bacterial membrane Certainty=0. 6031 (Affirmative) < suco 

bacterial outside Certainty=O.oO0O (Not Clear) < suco 

60 bacterial cytoplasm Certainty=O.oO0O (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

!GB:AJ251472 peptidoglycan GlcNAc deacetylase [Strep... 239 4e-62 

>GP:CAB96552 GB:AJ251472 peptidoglycan GlcNAc deacetylase 
5 [Streptococcus pneumoniae] 

Identities = 136/438 (31%) , Positives = 230/438 (52%) , Gaps = 23/438 (5%) 

Query: 3 KLNVILVGLLSILMLSLAI VFINRWKIMEDSQRIVLAEKKKNTSDLVIKAVKHIKK 58 

K +L+ L+ IL +S+ + + ++ Q+I +K+K+ +H ++ 

10 Sbjct: 13 KTRHVLLALIGILAISICLLGGFIAFKIYQQKSFEQKIESLKKEKDDQLSEGNQKEHFRQ 72 

Query: 59 DQKDYYYFSPIK--QADDFFVDNLPVSLYKKKNSDKELILVRPKLQSSHLRSVNTTLTISK 116 

Q + + P++ + ++ + KS L+ + + SL+V ++K 

Sbjct: 73 GQAEVIAYYPLQGEKVISSWELINQDVKDKLESKDNLVFYYTEQEESGLKGVVNRNVTK 132 

15 

Query: 117 IVYQKKFFHLAKKSEKVISTYHVTDDLKPFQVKDLVSGHL ERIQEEVEKKYPDAGFN 173 

+Y F + + + + H+T+D +PF + L S E++ +E+ D 

Sbjct: 133 QIYDLVAFKIEETEKTSLGKVHLTEDGQPFTLDQLFSDASKAKEQLIKELTSFIEDKKIE 192 

20 Query: 174 SDKYNGLKESNS LLSDGFEVKSGNLIFD 7 KKLTI PLTTLFD VINPDFLAN 222 

D+ + ++ S L + F+ K +1 +++ +P++ FDVI +L 

Sbjct: 193 QDQSEQIVKNFSDQDLSAWNFDYKDSQIIIiYPSPWENLEEIALPVSAFFDVIQSSYLLE 252 

Query: 223 SDRAAYDMYRTYKEQHPKKLVALTFDDGPDPTTTPQVLDILAKYQAKGTFFMIGSKVVNN 282 
25 D A Y +Y K Q K+VALTFDDGP+P TTPQVL+ LAKY K TFF++G V N 

Sbjct: 253 KDAALYQSYFDKKHQ KWALTFDDGPNPATTPQVLETIiAKYDIKATFFVLGKNVSGN 309 

Query: 283 ENLTKRVSDAGHEIANHTWDHPl^TNLSVSEIQHQVNMTNQAIEKACGKKPRYLRPPYGA 342 
E+L KR+ GH + NH+W HP L+ LS+ E + Q+T +KG + +RPPYGA 
30 Sbjct: 310 EDLVKRIKSEGHWGNHSWSHPILSQLSLDEAKKQITDTEDVLTKVLGSSSKL^1RPPYGA 369 

Query: 343 TNATVQQSSGLTQMLWTVDTRDWENHSTDGIMTNVKNQLQPGGVVLMHDIHQTTINALPT 402 

++ S L+ ++W VD+ DW++ + I+T +++Q+ G +VLMHDIH T+NALP 
Sbjct: 370 ITDDIRNSLDLSFIMWDVDSLDWKSKNEASILTEIQHQVANGSIVLMHDIHSPTVNALPR 429 

35 

Query: 403 VMEYLKAEGYECVTVSEL 420 

V+EYLK +GY VT+ E+ 
Sbjct: 430 VI EYLKNQGYTFVT I PEM 447 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/420 (40%) , Positives = 259/420 (61%) , Gaps = 12/420 (2%) 

Query: 4 LIIGIFSLIIIAIIiAWQGFSFLKHKEIKLQQAVVEKEIRIAEKTVEVVKRQKTER--VLF 61 
+++G+ S+++++ LA + K E + + EK+ ++ ++ VK K ++ + 
45 Sbjct: 7 ILVGLLSILMLS-IiAIVFINRWKLiNEDSQRIvLAEKKKNTSDL 65 

Query:, 62 LEPKGYDKSLSADILKWNQKSFEHKKFYDNQYIILRPQLADSNFANVKKLSIYQILYQKE 121 

P D L S KK D + I++RP+L S+ +V L+I +I+YQK+ 

Sbjct: 66 FSPIKQADDFFVDNLP VSLYKKKNSDKELILVRPKLQSSHLRSVNTLTISKIVYQKK 122 

50 

Query: 122 KGSMFQKSSRLLRTYLLDQNKKPFELDELLAHNISGFKAILENIAPGTQLKEHDSNKEFL 181 

+ +KS +++ TY + + KPF++ +L++ ++ + +E P N 
Sbjct: 123 FFHLAKKSEKVISTYHWDDLKPFQVKDLVSGHLERIQEEVEKKYPDAGFNSDKYNGLKE 182 

55 Query: 182 KTGRVTDGLD VKDGKLI IND - LKLPLDKLYNVIDESYLKSSDLDLVSNL KAKAPR - - 235 

++DG +VK G LI + L +PL L++VI+ +L +SD N K + P+ 

Sbjct: 183 SNSLLSDGFEVKSGNLIFDKKLTIPLTTLFDVINPDFLRNSDRAAYDNYRTYKEQHPKKL 242 

Query: 236 VALTFDDGPNEKTTPKALEILKRYNAKATFFVMGQSAVGHTDILQRMHAEGHEIGNHTWD 295 
60 VALTFDDGP+ TTP+ L+IL +Y AK TFF++G V + ++ +R+ GHEI NHTWD 

Sbjct: 243 VALTFDDGPDPTTTPQVLDILAKYQAKGTFFMIGSKVVNftn^ 302 

Query: 296 HPNLTKLPAEKIKEEIHKTNDLIMKATGQKPVYLRPPYGATNATVKTVTGLK^ 355 
HPNLT L +1+ +++ TN I KA G+KP YLRPPYGATNATV+ +GL +MLW+VDT 
65 Sbjct: 303 HPNLTmSVSEIQHQVM^TNQAIEKACGKKPRYLRPPYGATNATVQQSSGLTQMLWTVDT 362 
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Query: 356 EDWKNHNTQJ^TNIKKQLRPGGVILMHDIHQTTIDALPTIMDYLTTQGYYFVTVGELYS 415 

DW+NH+T +MTN+K QL+PGGV+LMHDIHQTTI+ALPT+M+YL +GY VTV ELY+ 
Sbjct: 363 RDVffilfflSTDGIMTNVKNQLQPGGWLMHDIHOT^ 422 

GBS281d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 152 (lane 8-10; MW 71.5kDa) and in Figure 187 (lane 10; MW 71kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 152 
(lane 12; MW 46.5kDa) and in Figure 183 (lane 2; MW 46kDa). Purified GBS281d-GST is shown in lane 6 
of Figure 237. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1453 

A DNA sequence (GBSxl539) was identified in S.agalactiae <SEQ ID 4461> which encodes the amino 
acid sequence <SEQ ID 4462>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4463> which encodes the amino acid 
sequence <SEQ ID 4464>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

>» Seems to have no N-terminal signal sequence 



Certainty=0 . 2799 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 311/475 (65%) , Positives = 389/475 (81%) 



Query: 


1 


MTKEYQNYVNGEWKSS VNQIE I LS P I DDSSLGFVPAMTREE VDHAMKAGREALPAWAALT 


60 






+ K+Y+N VNGEWK S N+I I +P LG VPAMT+ EVD + ++AL W AL+ 




Sbjct: 


1 


LAKQYKNLVNGEWKLSENEITIYAPATGEELGSVPAMTQAEVDAVYASAKKALSDWRALS 


60 


Query: 


61 


WERAQYLHKAADIIERDKEEIATVIiAKEISKAYNASvTEVVRTADLIRYAAEEGIRLST 


120 






ERA YLHKAADI+ RD E+I +L+KE++K + A+V+EV+RTA+ + 1 YAAEEG+R+ 




Sb j ct : 


61 


YVERAAYLHKAADILVRDAEKIGAILSKEVAKGHKAAVSEVIRTAEIINYAAEEGLRMEG 


120 


Query: 


121 


SADEGGKMDASTGHKIAVIRRQPVGIVIAIAPYNYPvNLSGSKIAPALIGGNVvMFKPPT 


180 






EGG +A++ K+A++RR+PVG+VLAI+P+NYPVNL+GSKIAPALI GNW KPPT 




Sbjct: 


121 


EVlEGGSFEAASKKKIAIWREPVGLvIAISPFNYPvNLAGSKIAPALIAGNWALKPPT 


180 


Query: 


181 


QGSVSGLVIAKAFAEAGBPAGVFNTITGRGSEIGDYIVEHEEVNFINFTGSTPVGKRIGK 


240 






QGS+SGL+LA+AFAEAG+PAGVFNTITGRGS IGDYIVEHE V+FINFTGSTP+G+ IGK 




Sb j ct : 


181 


CGSISGLLIAEAFAEAGIPAGVFNTITGRGSVIGDYIVEHEAVSFINFTGSTPIGEGIGK 


240 


Query: 


241 


IAGMRPIMLELGGKDAGVVLADADLDNAAKQIVAGftYDYSGQRCTAIKRvIiVvEEVADEL 


300 






LAGMRPIMLELGGKD+ +VL DADL AAK IVAGA+ YSGQRCTA+KRVLV+++VAD+L 




Sb j ct : 


241 


IAGMRPIMLELGGKDSAI vLEDADIAIiAAKNIvAGAFGYSGQRCTAVTQlVLvMDKVADQL 


300 


Query: 


301 


AEKISEWAKIiSVGDPFDNATVTPVIDDNSADFIESLVVDARQKGAKELNEFKRDGRLLT 


360 



Final Results 

bacterial cytoplasm 

bacterial membrane 

bacterial outside 
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A +1 V KLSVG P D+A +TP+ID ++ADF+E L+ DA KGA L F R+G L++ 
Sbjct: 301 AAEIKTLVEKLSVGMPEDDADITPLIDTSAADFVEGLIKDATDKGATALTAFNREGNLIS 360 

Query: 361 PGLFDHVTLDMKIAWEEPFGPILPIIRVKDAEEA'WAIANKSDFGLQSSVFTRDFQKAFDl 420 
5 P LFDHVT DM+LAWEEPFGP+LPIIRV EEA+ I+N+S++GLQ+S+FT +F KAF I 

Sbjct: 361 PVLFDHVTTDMRLAWEEPFGPVLPIIRVTTVEEAIKISNESEYGLQASIFTTNFPKAFGI 420 

Query: 421 ANKLEVGTVHINNKTGRGPDNFPFLGLKGSGAGVQGIRYSIEAMTNVKSIVFDMK 475 
A +LEVGTvH+NNKT RG DNFPFLG K SGAGVQG++YSIEAMT VKS+VFD++ 
10 Sbjct: 421 AEQLEVGTVHLNNKTQRGTDNFPFLGAKKSGAGVQGVKYSIEAMTTVKSWFDIQ 475 

A related GBS gene <SEQ ID 8815> and protein <SEQ ID 8816> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
15 McG: Discrim Score: -15.11 

GvH: Signal Score (-7.5): 0.17 

Possible site: 57 
>>> Seems to have no N- terminal signal sequence 
ALOM program count: 0 value: 1.22 threshold: 0.0 
20 PERIPHERAL Likelihood = 1.22 187 

modified ALOM score: -0.74 

*** Reasoning Step: 3 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

66.8/82.6% over 474aa 

Streptococcus mutans 

EGAD | 42413 | NADP- dependent glyceraldehyde-3 -phosphate dehydrogenase Insert characterized 
EGAD 1 42413 1 110509 NADP-dependent glyceraldehyde-3 -phosphate dehydrogenase Insert 
35 characterized 

SP | Q59931 1 GAPN_STRMU NADP-DEPENDENT GLYCERALDEHYDE-3 -PHOSPHATE DEHYDROGENASE (EC 1.2.1.9) 
(NON- PH0SPH0RYLATING GLYCERALDEHYDE 3 -PHOSPHATE DEHYDROGENASE) (GLYCERALDEHYDE-3 -PHOSPHATE 
DEHYDROGENASE [NADP+] ) (TRIOSEPHOSPHATE DEHYDROGENASE) . Edit characterized 
GP| 642667 | gb | AAA91091 . 1 1 |L38521 NADP-dependent glyceraldehyde-3 -phosphate dehydro Insert 
40 characterized 

ORF01688(301 - 1725 of 2025) 

EGAD | 42413 | 44796 (1 - 475 of 475) NADP-dependent glyceraldehyde-3 -phosphate dehydrogenase 
{Streptococcus mutans}EGAD|42413 | 110509 NADP-dependent glyceraldehyde-3 -phosphate 

45 dehydrogenase {Streptococcus mutans}SP |Q59931 )GAPN_STRMU NADP-DEPENDENT GLYCERALDEHYDE- 3- 

PHOSPHATE DEHYDROGENASE (EC 1.2.1.9) (NON- PHOSPHORYLATING GLYCERALDEHYDE 3-PHOSPHATE 
DEHYDROGENASE) (GLYCERALDEHYDE- 3 -PHOSPHATE DEHYDROGENASE [NADP+] ) (TRIOSEPHOSPHATE 
DEHYDROGENASE) .GP | 642667 | gb |AAA91091 . 1 1 |L38521 NADP-dependent glyceraldehyde-3 -phosphate 
dehydro 

50 %Match =49.3 

%Identity =66.7 %Similarity = 82.5 

Matches =317 Mismatches = 83 Conservative Sub.s = 75 

195 225 255 285 315 345 375 405 

55 *GLKNLYFFIESLDIVKFLRKICQIIEINR*SDRINLLQCKRRFTLTKEYQNYVNGEWKSSVNQIEILSPIDDSSLGFVP 

:||:|:|||||||| I 1=1=1 I = II II 
MTKQYKNYVNGEWKLSENEIKIYEPASGAELGSVP 
10 20 30 

60 435 465 495 525 555 585 615 645 

AMTREEVDHAMKAGRF^PAWAALTVYERAQYLHKAADIIERDKEEIATVLAKEI^ 

ii= iiii= = = = i iii n= iii 1 1 1 1 m= 1111 = 1 =i=ii = = i i = = i = iiiiii = = i iiiiii 

AMSTEEVDYWASAKKAQPAWRALSYIERAAYLHKVADILMRDKEKIGAILSKEVAKGYKSAVSEVTOTAEIINYAAEEG 
50 60 70 80 90 100 HO 

65 
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675 705 735 765 795 825 855 885 

IRLSTSADEGGKMDASTGHKLAVIRRQPVGIVLAIAPYira^ 

:|: III :|:: I = I I = I I = I I I = I I I I = I = I I I I I I = I I I I I I I I I I I I = I I I I I I I I = I I I = I I = I I I I 

LRMEGEVLEGGSFFAASKKKIAVWREPVGLVIAISPFNYPvNIAGSK^ 
5 130 140 150 160 170 180 190 

915 945 975 1005 1035 1065 1095 1125 

AGLPAGVFOTITGRGSEIGDYIVEHEEVNFINFTGSTPVGKRIGKLAGMRPIMLELGGKDAGVVLADADLDNAAKQIVAG 

llllllllllllllllllllllllh MINIMI :|:|ll|:||lllllimil|: =11 1111= II hll 
10 AGLPAGVFNTITGRGSEIGDYIVEHQAVNFINFTGSTGIGERIGKMAGMRPIMLELGGKDSAIVLEDADLELTAKNIIAG 

210 220 230 240 250 260 270 

1155 1185 1215 1245 1275 1305 1335 1365 

AYDYSGQRCTAIKRVLVVEEVADEIAEKISENVAKIjSVGXPFDNATVTPVIDDNSADFIESLVVDARQKGAKEL^FKRD 

15 1: ||||||ll=lllll=l lllll III I I l==l I 1=1 =11=11 lll==l 1= II III I I 11= 

afgysgqrctavkrvlvmesvadelvekirekvlaltignpeddaditplidtksadyveglindandkgatalteikre 

290 300 310 320 330 340 350 

1395 1425 1455 1485 1515 1545 1575 1605 

20 grlltpglfdhvtldmkiai^epfgpilpiirvkdaeeavaiankxdfglqssvftrdfqkafdianklevgtvhinnkt 

I 1= I III II 11=111111111=111111 111= 1=11 ==111=1=11 II =11 II =111111111111 

gnlicpilfdkvttdmriaweepfgpvlpiirotsveeaieisnkseyglq 

370 380 390 400 410 420 430 

25 1635 1665 1695 1725 1755 1785 1815 1845 

grgpdnfpflglkgsgagvqgirysieamtnvksivfdmk*t*ndstivs*wl*tsftlkiknyiif*sgfifvi*ls* 

II III; I llll=ll==lllllll 111=111=1 

QRGTDNFPFLGAKKSGAGIQGVKYSIEAMTTVKSWFDIK 
450 460 470 

30 

SEQ ID 8816 (GBS127) was expressed in Exoli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 10; MW 55.9kDa). 

GBS127-His was purified as shown in Figure 200, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1454 

A DNA sequence (GBSxl540) was identified in S.agalactiae <SEQ ID 4465> which encodes the amino 
acid sequence <SEQ ID 4466>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
40 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 427 - 443 ( 427 - 443) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA78049 GB:AB027569 phosphoenolpyruvate-protein 
50 phosphotransferase [Streptococcus bovis] 

Identities = 534/577 (92%) , Positives = 559/577 (96%) 

Query: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDTNAEEARLDVALQASQDELSVIRE 60 
MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDT+AEEARLD AL+ASQDELS+ IRE 
55 Sbjct: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDTSAEEARLDAALKASQDELSIIRE 60 

Query: 61 KAWSLGEEAAAVFDAHLMVLSDPEMINQIKETIRAKQVNAETGLKEVTDMFITIFEGME 120 

KAVE+LGEEAAAVFDAHLMVL+DPEMI+QIKETIRAKQ NAE GLKEVTDMFITI FEGME 
Sbjct: 61 KAVETLGEEAAAVFDAHLMVLADPEMISQIKETIRAKQTNAEAGLKEVTDMFITIFEGME 120 



WO 02/34771 



PCT/GB01/04789 



-1606- 



Query: 121 DNPYMQERAADIRDVAKRVLAHLLGVKLPNPATINEESIVIAHDLTPSDTAQLNKQFVKA 180 

DNPYMQERAADIRDVAKRVLAHLLG KLPNPATI+EESIVIAHDLTPSDTAQLNKQFVKA 
Sbjct: 121 DNPYMQERAADIRDVAKRVLAHLLGAKLPNPATIDEESIVIAHDLTPSDTAQLNKQFVKA 180 

5 

Query: 181 FVTNIGGRTSHSAIMARTLEIAAVLGTNDITERVQDGQLIAVNGITGEVIIEPTEAQISA 240 

FVTNIGGRTSHSAIMARTLEIAAVLGTNDIT RV+DG ++AVNGITGEVI I PT+ Q++ 
Sbjct: 181 FVTNIGGRTSHSAIMARTLEIAAVLGTMJITSRVKDGDIVAVNGITGEVIINPTDEQVAE 240 

10 Query: 241 FKAAGEAYAKQKAEWALLKDAQTVTADGKHFELAANIGTPKDVEGVNENGAEAVGLYRTE 300 

FKAAGEAYAKQKAEWALLKDA+TVTADGKHFELAANIGTPKDVEGVN NGAEAVGLYRTE 
Sbjct: 241 FKAAGEAYAKQKAEWALLKDAKTVTADGKHBBLAANIGTPKDVEGVNANGAEAVGLYRTE 300 

Query: 301 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMDIGGDKELPYFDLPKEMNPFLGFR 360 
15 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMD I GGDKELPY DLPKEMNPFLGFR 

Sbjct: 301 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMDIGGDKELPYLDLPKEMNPFLGFR 360 

Query: 361 ALRI S I SETGDAMFRTQIRALLRAS VHGQLRIMFPMVALLKEFRAAKAI FEEEKANLLAD 420 
ALRI S I SETG+AMFRTQ IRALLRAS VHGQLRI MF PMVALLKEFRAAKAI F+EEKANL A+ 
20 Sbjct: 361 ALRISISETGNAMFRTQIRALLRASVHGQLRIMFPMVALLKEFRAAKAIFDEEKANLKAE 420 

Query: 421 GVAVAEGIEVGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 480 

GVAV++ IWGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 
Sbjct: 421 GVAVSDDIQVGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 480 

25 

Query: 481 YNPS ILRLINNVI KAAHAEGKWAGMCGEMAGDQTAVPLLVGMGLDEFSMSATSVLRTRSL 540 

YNPS ILRLINNVI KAAHAEGKW GMCGEMAGDQ AVPLLV MGLDEFSMSATS+LRTRSL 
Sbjct: 481 YNPSILRLINNVIKAAHAEGKWVGMCGEMAGDQKAVPLLVEMGLDEFSMSATSILRTRSL 540 

30 Query: 541 MKKLDTAKMEEYANRALSECSTMEEVIELQKEYVDFD 577 

MKKLDTAKM+EYANRAL+ECSTMEEV+EL KEYV+ D 
Sbjct: 541 MKKLDTAKMQEYANRALTECSTMEEVLELSKEYVNVD 577 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4467> which encodes the amino acid 
35 sequence <SEQ ID 4468>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N- terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 0875 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 540/577 (93%) , Positives = 561/577 (96%) 

Query: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDTNAEEARLDVALQASQDELSVIRE 60 

MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTV DTNAEEARLDVALQA+QDELSVIRE 
Sbjct: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVADTNAEEARLDVALQAAQDELSVIRE 60 

50 

Query: 61 KAVESLGEEAAAVFDAHLMVLSDPEMINQIKETIRAKQvNAETGLKEVTDMFITIFEGME 120 

AVESLGEEAAAVFDAHLMVL+DPEMI+Q+KETIRAKQ NAETGLKEVTDMFITIFEGME 
Sbjct: 61 NAVESLGEEAAAVFDAHLMVLADPEMISQVKETIRAKQTNAETGLKEVTDMFITIFEGME 120 

55 Query: 121 DNPYMQERAADIRDVAKRVI1AHLLGVKI.PNPATINEESIVIAHDLTPSDTAQLNKQFVKA 180 

DNPYMQERAADIRDVAKRVLAHLLGVKLPNPATINEESIVIAHDLTPSDTAQLNKQFVKA 
Sbjct: 121 DNPYMQERAADIRDVAKRVIiAHLLGVKLPNPATINEESIVIAHDLTPSDTAQI^KQFVKA 180 

Query: 181 FVTNIGGRTSHSAIMARTLEIAAvT^GTNDITERVQDGQLIAVNGITGEVIIEPTEAQISA 240 
60 FVTNIGGRTSHSAIMARTLEIAAVLGTNDIT+RV+DG + IAVNGITGE VI I+P+E Q+ A 

Sbjct: 181 FVTNIGGRTSHSAIMARTLEIAAVLGTNDITKRVTQ3GDVIAVNGITGEVIIDPSEDQVIiA 240 

Query: 241 FKAAGEaYAKQKAEWALLKDAQTOTADGKHPELAANIGTPKDVEGVNENGAEAVGLYRTE 300 
FK AG AYAKQKAEW+LLKDA T TADGKHFELAANIGTPKDVEGVN+NGAEAVGLYRTE 
65 Sbjct: 241 FKEAGAAYAKQKAEWSLLKDAHTETADGKHFELAANIGTPKDVEGVNDNGAEAVGLYRTE 300 
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Query: 301 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPWVRTMDIGGDKELPYFDLPKEMNPFLGFR 360 

FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMDIGGDKELPYFDLPKEMNPFLGFR 
Sbjct: 301 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPWVRTMDIGGDKELPYFDLPKEMNPFLGFR 360 

5 

Query: 361 ALRI S I SETGDAMFRTQI RALLRAS WGQLRIMFPMVALLKEFRAAKAI FEEEKANLLAD 420 

ALRISISETGDAMFRTQ+RALLRASVHGQ]^IMFPMVALLKEFRAAKA+F+EEKANLIiA+ 
Sbjct: 361 ALRISISETGDAMFRTQMRALLRASVHGQLRIMFPMVALLKEFRAAKAVFDEEKANLLAE 420 

10 Query: 421 GVAVAEGIEVGIMIEIPAAAMLADQFAKEVDFFSIGTKDLIQYTMAADRMNEQVSYLYQP 480 

GVAVA+ IWGIMIEIPAAAMIaADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 
Sbjct: 421 GVAVADDIQVGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 480 

Query: 481 YNPSILRLINNVIKAAHAEGKWAGMCGEMAGDQTAVPLLVGMGLDEFSMSATSVLRTRSL 540 
15 YNPSILRLINNVIKAAHAEGKWAGMCGEMAGDQ AVPLLVGMGLDEFSMSATSVLRTRSL 

Sbjct: 481 YNPSILRLINNVIKAAHAEGKWAGMCGEMAGDQQAVPLLVGMGLDEFSMSATSVLRTRSL 540 

Query: 541 MKKLDTAKMEEYANRALSECSTMEEVI ELQKEYVDFD 577 
MKKLD+AKMEEYANRAL+ECST EEV+EL KEYV D 
20 Sbjct: 541 MKKLDSAKMEEYANRALTECSTAEEVLELSKEYVSED 577 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1455 

25 A DNA sequence (GBSxl541) was identified in S.agalactiae <SEQ ID 4469> which encodes the amino 
acid sequence <SEQ ID 4470>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 1421 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein is similar to a protein from S.bovis: 

>GP:BAA78048 GB:AB027569 histidine containing protein [Streptococcus bovis] 
Identities = 86/87 (98%) , Positives = 87/87 (99%) 

Query: 1 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 60 
40 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 

Sbjct: 1 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 60 

Query: 61 VTISAEGADADDAIAAIEETMTKEGLA 87 
VTI SAEGADADDA+AAI EETMTKEGliA 
45 Sbjct: 61 VTISAEGADADDALAAIEETMTKEGLA 87 

A related DNA sequence was identified in S. pyogenes <SEQ ID 447 1> which encodes the amino acid 
sequence <SEQ ID 4472>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1421 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/87 (98%) , Positives = 87/87 (99%) 
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Query: 1 MASKDFHIVAETGIHaRPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 60 

mSKDFHIVAETGIHPJIPATLLVCfrASKFASDITLDYKGKAVWLKSIMGVMSLGVGQGAD 
Sbjct: 1 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 60 

5 Query: 61 VTISAEGADADDAIAAIEETMTKEGLA 87 

VTISAEGADA+DAIAAIEETMTKEGLA 
Sbjct: 61 VTISAEGADAEDAIAAIEETMTKEGLA 87 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1456 

A DNA sequence (GBSxl542) was identified in S.agalactiae <SEQ ID 4473> which encodes the amino 
acid sequence <SEQ ID 4474>. This protein is predicted to be glutaredoxin-like protein nrdh (b2673). 
Analysis of this protein sequence reveals the following: 

15 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4532 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA63372 GB.-X92690 glutaredoxin-like protein [Lactococcus 
25 lactis] 

Identities = 42/70 (60%) , Positives = 53/70 (75%) 

Query: 4 ITVFSKNNCMQCKMTKKFLDQHGADFEEINIDEKPEKIEYVKNIX3FSAAPVIEAGNVVFS 63 
+TV+SKNNCMQCKM KK+L +H F EINIDE+PE +E V +GF AAPVI + FS 
30 Sbjct: 2 VTVYSKNNCMQCKIWKKWLSEHElAFNEINIDEQPEFvEKVIEMGFRAAPVITKDDFAFS 61 

Query: 64 GFQPSKLKEL 73 

GF+PS+L +L 
Sbjct: 62 GFRPSELAKL 71 



35 



40 



45 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4475> which encodes the amino acid 
sequence <SEQ ID 4476>. Analysis of this protein sequence reveals the following: 



Possible site: 17 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4606 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 56/71 (78%) , Positives = 68/71 (94%) 

Query: 4 IOTFSKNNCMQCKMTKKFLDQHGADFEEINIDEKPEKIEYVKNLGFSAAPVIEAGNWFS 63 
50 ITV+SKNNCMQCKMTKKFL+QHG +F+EINIDE PEK++YVK+LGF++APVIEA N+VFS 

Sbjct: 13 ITVYSKNNCMQCKMTKKFLEQHGVNFQEINIDEHPEKVDYVKSLGFTSAPVIEADNLVFS 72 

Query: 64 GFQPSKLKELV 74 
GFQP+KLKEL+ 
55 Sbjct: 73 GFQPAKLKELI 83 



WO 02/34771 



PCT/GB01/04789 



-1609- 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1457 

A DNA sequence (GBSxl543) was identified in S.agalactiae <SEQ ID 4477> which encodes the amino 
5 acid sequence <SEQ ID 4478>. This protein is predicted to be ribonucleotide reductase subunit R1E (nrdE). 
Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3676 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41036 GB:AF112535 ribonucleotide reductase alpha-chain 
[Corynebacterium glutamicum] 
Identities = 366/701 (52%) , Positives = 488/701 (69%) , Gaps = 19/701 (2%) 

20 Query: 23 NGQIPLHKDKEALTAFFKENVQPNSKAFDSITDKIAYLLKYDYLEEAFliNKYRPEFIEEL 82 

NG+I KD+EA +F ++V N+ F ++ +KI YL++ Y + L+KY +FI++L 
Sbjct: 22 NGKIQFEKDREAANQYFLQHVNQNTVFFHNLQEKIDYLVENKYYDPIVLDKYDFQFIKDL 81 

Query: 83 STKLFDKKFRFKSFMAAYKFYQQYALKTNDGEYYLESIEDRVLFNALYFADGDEELATDL 142 
25 + + KFRF+SF+ AYK+Y Y LKT DG YLE EDRV AL ADGD LA +L 

Sbjct: 82 FKRAYGFKFRFQSFLGAYKYYTSYTLKTFDGRRYLERFEDRVCMVALTLADGDRALAENL 141 

Query: 143 ALEMISQRYQPATPSFLNMRSRRGELVSCFLIQVTDDMNAIGRSINSALQLSRIGGGVG 202 
E++S R+QPATP+ FLN+G+ + +RGE VSCFL+++ D+M +IGRSINSALQLS+ GGGV 
30 Sbjct: 142 VDEIMSGRFQPATPTFLNSGKAQRGEPVSCFLLRIEDNMESIGRSINSALQLSKRGGGVA 201 

Query: 203 ISLSNLREAGAPIKGFAGAASGWPVMKLFEDSFSYSNQLGQRQGAGWYLDVFHPDIIS 262 

+ LSNLREAGAPIK +SGV+PVMKL ED+FSY+NQLG RQGAG VYL+ HPDI+S 

Sbjct: 202 LLLSNLRFAGAPIKKIENQSSGVIPVMKLLEDAFSYANQLGARQGAGAVYLNAHHPDILS 261 

35 

Query: 263 FLSTKKENADEKVRVKTLSLGITVPDKFYELARNNQEMYLFSPYSIEREYGVPFSYIDIT 322 

FL TK+ENADEK+R+KTLSLG+ +PD +ELA+ N +MYLFSPY +ER YG PF+ + IT 
Sbjct: 262 FLDTKRENADEKIRIKTLSLGWIPDITFELAKRNDDMYLFSPYDVERIYGKPFADVSIT 321 

40 Query: 323 EKYDELVANPNITKTKINARDLETEISKLQQESGYPYIINIDTANRTNPvTJGKIIMSNLC 382 

E YDE+V + I KTKINAR ++++Q ESGYPYI+ DT N +NP++G+I SNLC 

Sbjct: 322 EHYDEMVDDDRIRKTKINARQFFQTLAEIQFESGYPYIMYEDTVNASNPIEGRITHSNLC 381 

Query: 383 SEILQVQKPSLINDAQEYLEMGTDISCNLGSTNVIjNMMTSPDFGKSIKTMTRALTFVTDS 442 
45 SEILQV PS ND Y E+G DISCNLGS NV M SP+F K+I+T R LT V++ 

Sbjct: 382 SEILQVSTPSEFNDDLTYAEVGEDISC^NLGSIiNVAMAMDSPNFEKTIETAIRGLTAVSEQ 441 

Query: 443 SNIEAVPTIKNGNAQAHTFGLGAMGLHSYLAKNHIEYGSPESIEFTDIYFMLMNYWTLVE 502 
++I++VP+I+ GN AH GLG M LH Y + H+ YGS E+++FT+ YF + Y L 
50 Sbjct: 442 TSIDSVPSIRKGNFAAHAIGLGQMNLHGYFGREHMHYGSEEALDFTNAYFAAVLYQCLRA 501 

Query: 503 SNNIARERQTTFVGFEKSKYADGTYFDKYVSGKFVPQSDKVKSLFA- -NHFI PEAKDWEN 560 

SN IA ER F FE SKYA G YFD + + F P+SDKVK LFA N P +DW 
Sbjct: 502 SNKIATERGERFKNFENSKYATGEYFDDFDANDFAPKSDKVKELFAKSNIHTPTVEDWAA 561 

55 

Query: 561 LRYAVMKDGLYHQNRLAVAPNGSISYINDCSASIHPITQRIEERQEKKIGKIYYPANGIA 620 

L+ VM+ GL+++N AV P GSISYIN+ ++SIHPI +IE R+E KIG++YYPA + 
Sbjct: 562 LKAD VMEHGLFNRNLQAVPPTGS I SYINNSTSS IHPIASKIE IRKEGKIGR VYYPAPHMD 621 

60 Query: 621 TDTIPYYTSAYDITO^KVIDWAAATEHVDQGIiSMTLFLRSELPKELYEWKTESKQTTRD 680 
D + Y+ AY++ K+ID YA AT++VDQGLS+TLF + TTRD 
Sbjct: 622 NDNLEYFEDAYEIGYEKIIDTYAVATKYVDQGLSLTLFFK DTATTRD 668 
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Query: 681 LSI LRNYAFNKGVKS I YYI - - RTFTDDGSEVGANQCESCVT 719 

++ + YA+ KG+K++YYI R +G+EV + C SC++ 
Sbjct: 669 INRAQIYAWRKGIKTLYYIRLRQVALEGTEV--DGCVSCML 707 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4479> which encodes the amino 
sequence <SEQ ID 4480>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4241 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 628/719 (87%) , Positives = 682/719 (94%) 



Query: 


1 


MSLKNIGDVSYFRLNNEINRPVNGQIPLHKDKEALTAFFKENVQPNSKAFDSITDKIAYL 


60 






MSLK++GD+SYFRLNNEINRPVNG+IPLHKDKEAL AF ENV PN+ +F SIT+KI YL 




Sb j ct : 


1 


MSLKDLGD I S YFRLNNE INRP VNGKI PLHKDKEALKAFSAENVLPNTMS FTS I TEKIEYL 


60 


Query: 


61 


LKYDYLEEAFLNKYRPEFIEELSTKLFDKKFRFKSFMAAYKFYQQYALKTNDGEYYLESI 


120 






+ DY+E AF+ KYRPEFI EL + + + FRFKSFMAAYKFYQQYALKTNDGE+YLE++ 




Sb j ct : 


61 


ISNDYIESAFIQKYRPEFITELDSIIKSENFRFKSFMAAYKFYQQYALKTNDGEHYLENL 


120 


Query: 


121 


EDRvLFNALYFADGDEELATDLALEMISQRYQPATPSFLNAGRSRRGELVSCFLIQVTDD 


180 






EDRVLFNALYFADG E+IiA. DLiA+EMI +QRYyPATPSFLNAGKSRRGaL)VSLrLiyV J.DD 




Sb j Ct : 


121 


EDRvLFNALYFADGQEDLAKDIAvEMINQRYQPATPSFIiNAGRSRRGELvSCFLIQv^ 


180 


Query: 


181 


MNAIGRSINSALQLSRIGGGVGISLSNLREAGAPIKGFAGAASGWPVMKLFEDSFSYSN 


240 






IW+IGRSINSALQLSRIGGGVGI+LSNLREAGAPIKG+AGAASGWPVMKLFEDSF 




Sb j ct : 


181 


MNSIGRSINSALQLSRIGGGVGITLSNLREAGAPIKGYAGAASGVVPW 


240 


Query: 


241 


QLGQRQGAGVWLDVFHPDIISFLSTKKENADEKVRVKTLSLGITVPDKFYELARNNQEM 


300 






QLGQRQGAGVVYLWFHPDII+FLSTKKENADEKVRVKTLSLGITVPDKFYELAR N++M 




Sb j ct : 


241 


QLGQRQGAGWYIiNVFHPDIIAFLSTKKENADEKAmVKTLSLGITVPDKFYELARKNEDM 


300 


Query: 


301 


YLFSPYSIEREYGVPFSYIDITEKYDELVANPNITKTKINARDLETEISKLQQESGYPYI 


360 






YLFSPY++E+EYG+PF+Y+DIT YDELVANP ITKTKT ARDLETEISKLQQESGYPYI 




Sb j ct : 


301 


YLFSPYNVEKEYGIPFNYLDITNMYDELVANPKITKTKIKARDLETEISKLQQESGYPYI 


360 


Query: 


361 


INIDTANRTNPVDGKIIMSNLCSEILQVQKPSLINDAQEYLEMGTDISCNLGSTNVLNMM 


420 






INIDTAN+ NP+DGKI IMSNLCSEILQVQ PSLINDAQE++EMGTDISCNLGSTN+LNMM 




Sb j ct : 


361 


INIDTANKANPIDGKIIMS^CSEILQVQTPSLINDAQEFVEMGTDISCNLGSTNILNMM 


420 


Query: 


421 


TSPDFGKSIKTMTRALTFVTDSSNIEAVPTIKNGNAQAHTFGLGAMGLHSYLAKNHIEYG 


480 






TSPDFG+SIKTMTRALTFVTDSS+IEAVPTIK+GN+QAHTFGLGAMGLHSYLA++HIEYG 




Sb j ct : 


421 


TSPDFGRSIKTMTRALTFVTDSSSIEAVPTIKHGNSQAHTFGLGAMGLHSYLAQHHIEYG 


480 


Query: 


481 


SPESIEFTDIYFMLMNYWTLVESNNIARERQTTFVGFEKSKYADGTYFDKYVSGKFVPQS 


540 






S PES I EFTD I YFML+NYWTLVESNNIARERQTTFVGFE SKYA+G+YFDKYV+G FVP+S 




Sb j ct : 


481 


SPESIEFTDIYFMLIjNYVWLWSNNIARERQTTFVGFFjNSKYANGSYFDKYvTGHFVPKS 


540 


Query: 


541 


DKVKSLFANHFIPFA.KDWENLRYAVMKDGLYHQNRIAVAPNGSISYINDCSASIHPITQR 


600 






D VK LF +HFIP+A DWE LR AV KDGLYHQNRLAVAPNGSISYINDCSASIHPITQR 




Sbjct: 


541 


DLVKDLFKDHFIPQASDWEALRDAVQKDGLYHQNRLAVAPNGSISYINDCSASIHPITQR 


600 


Query: 


601 


IEERQEKKIGKIYYPANGLATDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLSMTLFLR 


660 






IEERQEKKIGKIYYPANGL+TDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLS+TLFLR 




Sbjct: 


601 


IEERQEKKIGKIYYPANGLSTDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLSLTLFLR 


660 



Query: 661 SELPKELYEWKTESKQTTRDLSILRNYAFNKGVKSIYYIRTFTDDGSEVGANQCESCVI 719 

SELP ELYEWKT+SKQTTRDLSILRNYAFNKG+KSIYYIRTFTDDG EVGANQCESCVI 
Sbjct: 661 SELPMELYEWKTQSKQTTRDLSILRNYAFNKGIKSIYYIRTFTDDGEEVGANQCESCVI 719 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1458 

A DNA sequence (GBSxl544) was identified in S.agalactiae <SEQ ID 4481> which encodes the amino 
acid sequence <SEQ ID 4482>. This protein is predicted to be ribonucleotide reductase subunit R2F (nrdB). 
Analysis of this protein sequence reveals the following: 

Possible site: 44 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4583 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9753> which encodes amino acid sequence <SEQ ID 9754> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

, ,>GP:AAC14561 GB:AF050168 ribonucleoside diphosphate reductase small 
20 subunit [Corynebacterium ammoniagenes] 

Identities = 166/313 (53%) , Positives = 215/313 (68%) , Gaps = 1/313 (0%) 

Query: 10 EAINWNEIEDVIDKSTWEKLTEQFWLOTRIPLSNDLDDWRKLSAQEKDLVGKVFGGLTLL 69 
+AINWN ID D W++LT FWL +IP+SND+ W K++ QE+ +VF GLTLL 
25 , Sbjct: 17 KAINWNVI PDEKDLE VWDRLTGNFWLPEKI PVSNDIQSWNKMTPQEQLATMRVFTGLTLL 76 

Query: 70 DTMQSETGVEAIRADVRTPHEEAVLNNIQE^SVHAKSYSSIFSTim'KSEIEEIFEWTN 129 

DT+Q G ++ DV T HEE V NI FMES VHAKSYS+ I F TL + +1 E F W+ 
Sbjct: 77 DTIQGTVGAISLLPDVETMHEEGVYTNIAFMESVHAKSYSNIFMTLASTPQINEAFRWSE 136 

30 

Query: 130 NNEFLQEKARIINDIYANGNALQKKVASTYLETFLFYSGFFTPLYYLGNNKIANVAEIIK 189 

NE LQ KA+II Y + L+KKVAST LE+FLFYSGF+ P+Y KL N A+II+ 

Sbjct: 137 ENENLQRKAKI IMSYYNGDDPLKKKVASTLLESFLFYSGFYLPMYLSSRAKLTNTADI IR 196 

35 Query: 190 LIIRDESVHGTYIGYKFQLGFNELPEDEQENFRDWMYDLLYQLYENEEKYTKTLYDGVGW 249 

LI IRDESVHG YIGYK+Q G +L E EQE ++ + +DL+Y LYENE +YT+ +YD +GW 
Sbjct: 197 LIIRDESVHGYYIGYKYQQGVKKLSEAEQEEYKAYTFDLMYDLYENEIEYTEDIYDDLGW 256 

Query: 250 TEEVMTFLRYNANKALMNLGQDPLFPDTANDVNPIVMNGIS-TGTSNHDFFSQVGNGYLL 308 
40 TE+V FLRYNANKAL NLG + LFP V+P +++ +S NHDFFS G+ Y++ 

Sbjct: 257 TEDVlOlFLRYNANKAIiNNLGYEGLFPTDETKVSPAILSSLSPNADENHDFFSGSGSSYVI 316 

Query: 309 GSVEAMHDDDYNY 321 
G E DDD+++ 
45 Sbjct: 317 GKAEDTTDDDWDF 329 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4483> which encodes the amino acid 
sequence <SEQ ID 4484>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4583 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 315/319 (98%) , Positives = 316/319 (98%) 





5 


MTTYYEAINWNEIEDVIDKSTWEKLTEQFWLDTRIPLSMLDDWRKLSAQEKDLVGKVFG 


64 






MTTYYEAINWNEIEDVIDKSTWEKLTEQFWLDTRIPLSNDLDDWRKLS QEKDLVGKVFG 




Sb j ct : 


1 


MTTYYFAINWRKIEDVIDKSTWEKLTEQFWLDT 


60 




65 


GLTLLDTMQSETG VEAIRADVRTPHEEA VLNNIQFMESVHAKS YSS I FSTLNTKSE I EE I 


124 






GLTLLDTMQSETGVEAI RADWTPHEFJWLNNIQFMESVHAKS YS S I FSTLNTK E IEE I 




Sbj Ct: 


61 


GLTLLDTMQSETGVEAIRADWTPHEEAVI^IQFMESvHAKSYSSIFSTIjOT'KKEIEEI 


120 




125 


FEWTMSINEFLQEKARIlNDIYANGNALQKroffiSTYLETFL 


184 






FEWTNlTOEFr.OEKSRIINDIYANG+ALOKKVASTYLETFLFYSGFFTPLYYLGNNKIiAlJV 




Sbj Ct : 


121 


FEWTNNNEFLQEKARIINDIYANGDALQKTO^TYLETFLFYSGFFTPLYYLG^KLAIW 


180 


Query: 


IOC 
±OD 


ATTTTTn\TTPnVQ^i^f5TVT(^VKFnT,nT7WT t PFnFn^ 

" n - 1 XIVLJX lIUJuU VI1VJ L ± J-\J 1 X\.I? ivjpiI ir^rii ; Hit _j Hi i U r luyni'il X< 1 1 1 1 J. l^J-J X m\ n 1 H\.X J. IV J. 1J1 


244 






ftFTTTCT.TTPrYFQ^TWlTVTrr/TfFnT.t^FKTFT.PFnFnFWFPnWIV^^ 




Sbj ct : 


181 


AEIIKLI1RDESVHGTYIGYKFQLGFNELPEDEQENFRDWMYDLLYQLYENEEKYTKTLY 


240 


Query: 


245 


DGVGWTEEVMTFLRYNANKALMNLGQDPLFPDTANDVNPIVMNGISTGTSNHDFFSQVGN 


304 






DGVGWTEEVMTFLRYNANKAL^LGQDPLFPDTAlffilTOPIViMNGISTGTSNHDFFSQVGN 




Sbj ct : 


241 


DGVGOTEEVMTFLRYNANKMiMI^GQDPLFPDTANDWPIvMNGISTGTSNHDFFSQVGN 


300 


Query: 


305 


GYLLGSVEAMHDDDYNYGL 323 








GYLLGSVEAM DDDYNYGL 




Sbj ct : 


301 


GYLLGSVEAMSDDDYNYGL 319 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1459 

A DNA sequence (GBSxl545) was identified in S.agalactiae <SEQ ID 4485> which encodes the amino 
acid sequence <SEQ ID 4486>. Analysis of this protein sequence reveals the following: 
Possible site: 53 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 50 - 66 ( 50 - 66) 

Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1460 

A DNA sequence (GBSxl546) was identified in S.agalactiae <SEQ ID 4487> which encodes the amino 
acid sequence <SEQ ID 4488>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 


=-14 


.38 


Transmembrane 


176 - 


192 


( 


168 


- 201) 


INTEGRAL 


Likelihood 


= -4 


.57 


Transmembrane 


25 - 


41 


( 


22 


- 42) 


INTEGRAL 


Likelihood 


= -3 


.88 


Transmembrane 


94 - 


110 


( 


94 


- 112) 


INTEGRAL 


Likelihood 


= -1 


.49 


Transmembrane 


70 - 


86 


( 


70 


- 86) 


INTEGRAL 


Likelihood 


= -1 


.01 


Transmembrane 


128 - 


144 


( 


128 


- 144) 



Final Results 
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bacterial membrane Certainty=0. 6753 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 A related GBS nucleic acid sequence <SEQ ID 975 1> which encodes amino acid sequence <SEQ ID 9752> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



10 



>GP:CAB15077 GB:Z99119 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/184 (29%) , Positives = 98/184 (52%) , Gaps = 4/184 (2%) 

Query: 16 MSKNNNTTCLIETAI FAALAMALSMI P DFASWFTPSFGAIPLILFALRRGTKYGLF 71 

M+++ LIE AI A A+ L ++ + S IP+ L + R G K GL 

Sbjct: 1 MNQSKQLVRLIEIAIMTAAAVILDIVSGMFLSMPQGGSVSIMMIPIFLISFRWGVKAGLT 60 

15 Query: 72 AGLIWGLLHFVLSKVYYLSLSQVFIEYIIAFISMGIAGVFSAKFKDALSSSSKTKALSLA 131 

GL+ GL+ + ++ Q+ ++YI+AF ++G++G F++ + A S +K K + 

Sbjct: 61 TGLLTGLVQIAIGNLFAQHPVQLLLDYIVAFAAIGISGCFASSWKAAVSKTKGKLIVSV 120 

Query: 132 LSGAILATLVRYVWHYIAGVIFWASYAPKGMSATLYSLSVNGTAGLLTLFFWISIIILV 191 
20 +S + +L+RY H I+G +F+ S+APKG +YSL+ NT + + I + +L 

Sbjct: 121 VSAVFIGSLLRYAAHVISGAVFFGSFAPKGTPVWIYSLTYNATYMVPSFIICAIVLCLLF 180 

Query: 192 ISYP 195 
++ P 

25 Sbjct: 181 MTAP 184 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4489> which encodes the amino acid 
sequence <SEQ ID 4490>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
30 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.34 Transmembrane 162 - 178 ( 156 - 183) 
INTEGRAL Likelihood = -9.34 Transmembrane 110 - 126 ( 107 - 130) 
INTEGRAL Likelihood = -1.22 Transmembrane 55 - 71 ( 55 - 71) 

35 Final Results 

bacterial membrane Certainty=0. 4736 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:CAB15077 GB:Z99119 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/189 (29%) , Positives = 100/189 (52%) , Gaps = 10/189 (5%) 

Query: 1 MSPNTNVKYLIEAAIFAALAMTLSFIPDFAGWF- -SPSYGAIALV IFSLRRGLKY 53 

45 M+ + + LIE AI A A+ L + +G F P G+++++ + S R G+K 

Sbjct: 1 MNQSKQLvRLIEIAIMTAAAVTLDIV SGMFLSMPQGGSVSIMMIPIFLISFRWGVKA 57 

Query: 54 GMLAGLIWGLLHFVLGKVYYLSMSQVFIEYILAFTSMGLAGSFSDSLIKTLRRQQTFFAV 113 
G+ GL+ GL+ +G ++ Q+ ++YI+AF ++G++G F+ S+ K + + 

50 Sbjct: 58 GLTTGLLTGLVQIAIGNLFAQHPVQLLLDYIVAFAAIGISGCFASSVRKAAVSKTKGKLI 117 

Query: 114 FLAI^SLLAVTVRYLWHFI^GIIFWGSYAPKGMSAvWYSFSvNGTAGVLTFLITCLALM 173 

+ A + +RY H ++G +F+GS+APKG YS + N T V +F+I + L 

Sbjct: 118 VS WSAVFIGSLLRYAAHVT SGAVFFGSFAPKGTPVWIYSLTYNATYMVPSFI I CAIVLC 177 



55 



Query: 174 IALPIHPQL 182 

+ P+L 
Sbjct: 178 LLFMTAPRL 186 



60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/186 (62%) , Positives = 138/186 (73%) 
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Query: 



Sb j ct : 



1 



16 



MSKNNNTTCLIETAI FAALAMALSMI PDFASWFTPSFGAI PLILFALRRGTKYGLFAGLI 7 5 
MS N N LIE AIFAALAM LS IPDFA WF+PS+GAI L++F+LRRG KYG+ AGLI 
MSPmOTKYLIFJ^IFAALAMTLSFIPDFAGWFSPSYGAIALVIFSLRRGLKYGMLAGLI 60 



Query: 



76 



WGLLHFVLSKVYYLSLSQVFIEYILAFISMGIAGWSAKFKDALSSSSKTKALSLALSGA 135 
WGLLHFVL KVYYLS+SQVFIEYILAF SMGLAG FS L A+ LA+ + 

WGLLHFVLGKVYYLSMSQVFIEYILAFTSMGLAGSFSDSLIKTLRRQQTFFAVFLAIMAS 120 



Sbj ct : 



61 



Query: 



136 



I LATLVRYVWHYIAGVI FWAS YAPKGMSATLYSLS VNGTAGLLTLFFWI SIIILVISYP 195 
+LA VRY+WH+ +AG+ 1 FW SYAPKGMSA YS SVNGTAG+LT ++++I + +P 



Sbjct: 121 LIAVTVRYLVfflFmGIIFWGSYAPKGMSAVWYSFSVNGTAGVLTFLITCIALMIALPIHP 180 

Query: 196 SFFLPK 201 
F PK 

Sbjct: 181 QLFDPK 186 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1461 

A DNA sequence (GBSxl547) was identified in S.agalactiae <SEQ ID 449 1> which encodes the amino 
acid sequence <SEQ ID 4492>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-7. 


,43 


Transmembrane 


206 


- 222 


( 199 


- 223) 


INTEGRAL 


Likelihood = 


-6. 


,64 


Transmembrane 


24 


- 40 


( 19 


- 42) 


INTEGRAL 


Likelihood = 


-6. 


.58 


Transmembrane 


61 


- 77 


( 51 


- 78) 


INTEGRAL 


Likelihood = 


-6 


.58 


Transmembrane 


134 


- 150 


( 132 


- 154) 


INTEGRAL 


Likelihood = 


-4 


.62 


Transmembrane 


226 


- 242 


( 224 


- 245) 


INTEGRAL 


Likelihood = 


-3. 


.72 


Transmembrane 


107 


- 123 


( 106 


- 125) 



A related GBS nucleic acid sequence <SEQ ID 9749> which encodes amino acid sequence <SEQ ID 9750> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4493> which encodes the amino acid 
sequence <SEQ ID 4494>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 


=-10. 


,46 


Transmembrane 


134 


- 150 


( 131 


- 159) 


INTEGRAL 


Likelihood 


= -7. 


,59 


Transmembrane 


107 


- 123 


( 103 


- 128) 


INTEGRAL 


Likelihood 


= -7. 


.48 


Transmembrane 


225 


- 241 


( 213 


- 248) 


INTEGRAL 


Likelihood 


= -7, 


.22 


Transmembrane 


205 


- 221 


( 199 


- 224) 


INTEGRAL 


Likelihood 


= -3, 


.56 


Transmembrane 


50 


- 66 


( 50 


- 73) 


INTEGRAL 


Likelihood 


= -1, 


.28 


Transmembrane 


16 


- 32 


( 16 


- 33) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 3972 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane Certainty=0. 5182 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/253 (32%) , Positives = 149/253 (58%) , Gaps = 5/253 (1%) 

Query: 6 IKQSDTTFVRI IKSLLIGGFIGAILGSVGALFIIF- -GQDKYLSEI - -NIVQYFLWVSRI 61 
5 +K+ +F+R++K L+ G I+G + F+ + G+ +L+ + +++ + ++R+ 

Sbjct: 1 MKKKKNSFLRLLKMSLLSSIiAGGI IGGMVGAFLGYHGGRLDHLTFLKDDVINLI ILLNRL 60 

Query: 62 WIITALFSLiyLYQIQKYQKVFFNVDESQ-SEEIYRQINLRHSyGMTFVSISIVLSIVN 120 
W+ S ++L Q++K V+ ++E SE YRQ+N +H+Y M ++++ +LS+ N 

10 Sbjct: 61 VA7WDLTLSWFLTQLKKETAVYNTIEEDDISENGYRQLNKKHAYTMLLIAVASILSMCN 120 

Query: 121 TLFNYKLNI FDDSVTLVI P I YDLSLLFVLLGLHI YFLKVYRNI RGI KMTVAPTLKELKNN 180 

L L L IP+ D+ LL +++ +K Y IRG + P LKELK+N 

Sbjct: 121 VLLGLTLmSSQHAMLAIPLLDILLLLWIPFQALAMKRYNAIRGTDVPYFPNLKELKHN 180 

15 

Query: 181 VLQLDEAELESNYKMCFDIVMNLSGFIFPTIYFVLFFISFVFQKVEIVAIIITTSIHIYI 240 

++ LDEAEL++ +K F+ V++L+G I P++Y +LFF+ +VE+ AI++ I +Y+ 

Sbjct: 181 IMALDEAELQAYHKTSFESVLSLNGVIIPSLYVILFFVYLFTGQVELTAILVLVLIQLYL 240 

20 Query: 241 LIKSLKAARHFYR 253 

L+KS R FYR 
Sbjct: 241 LVKSATMTRQFYR 253 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 1462 

A DNA sequence (GBSxl548) was identified in S.agalactiae <SEQ ID 4495> which encodes the amino 
acid sequence <SEQ ID 4496>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5172 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 1463 

A DNA sequence (GBSxl549) was identified in S.agalactiae <SEQ ID 4497> which encodes the amino 
acid sequence <SEQ ID 4498>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2059 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76650 GB:AE000440 

UDP-D-glucose: (galactosyl) lipopolysaccharide 
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glucosyltransferase [Escherichia coli K12] 
Identities = 70/256 (27%) , Positives = 121/256 (46%) , Gaps = 14/256 (5%) 

Query: 1 MNLLFSIDDMYVDHFICVMLYSLVRQTKNRKLEIYvLQKT LLKRHTELIQYTQNLEV 56 

5 +N+ + +D Y+D V + S+V ++ L+ Y++ ++ +L + Q 

Sbjct: 28 IiNVAYGVDAKTYLDGVGVSITSIvIiNKIRH^ 87 

Query: 57 GYHPIIVGTEVFAQAPTTDRYPDTIYYRLLAHKFLPETLDRILYLDADMLCLNDFSSLYD 116 
Y + T+ P T + +Y+RL A + L TLDR+LYLDAD++C DSL 

10 Sbjct: 88 LYR---INTDKLQCLPCTQWSRAMYFR1.FAFQLLGLTLDRLLYLDADVVCKGD1SQLLH 144 

Query: 117 MELGDQLYAAASHNTDGKFLDYWKLRLKNVELESSYFNTGVLLMNLPAIRKVVHQQTIL 176 

+ L A A+ D + + RL + EL YFN+GV+ ++L + L 

Sbjct: 145 LGLNG AVAAVVKDVEPMQEKAVSRLSDPELIjGQYFNSGvvYLDLiKKWADAKLTEKMj 201 

15 

Query: 177 DYIMQNRGRLILPDQDIINGLYANLVKPIPDEIYNYDARYSLIYQLKSRNEWDLEWINH 236 

+M PDQD++N L + +P E Y+ Y++ +LK + + + +1 
Sbjct: 202 SILMSKDNVYKYPDQDVMNVLLKGMTLFLPRE YNTIYTIKSELKDKTHQNYKKLITE 258 

20 Query: 237 -TVFLHFAGRDKPWKK 251 

T+ +H+ G KPW K 
Sbjct: 259 STLLIHYTGATKPWHK 274 

No coiresponding DNA sequence was identified in S.pyogems. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1464 

A DNA sequence (GBSxl550) was identified in S.agalactiae <SEQ ID 4499> which encodes the amino 
acid sequence <SEQ ID 4500>. Analysis of this protein sequence reveals the following: 

30 Possible site: 20 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1406 (Affirmative) < suco 

35 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogems. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1465 

A DNA sequence (GBSxl551) was identified in S.agalactiae <SEQ ID 4501> which encodes the amino 
acid sequence <SEQ ID 4502>. Analysis of this protein sequence reveals the following: 

45 Possible site: 54 

>>> Seems to have an uncleavable N-term signal seg 
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Final Results 

bacterial membrane Certainty=0 . 5288 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07774 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 84/242 (34%) , Positives = 147/242 (60%) , Gaps = 16/242 (6%) 

Query: 1 WGLGTVINVILIIVGGEVGIiFLKNFLKESLQKSLMQAMGVAVTiFISISGVIjEKMMLVEK 60 

MV +GTV+N I++ +GL +KN + E ++ +LMQA+G+A++ + + KM L + 

Sbjct: 1 MVLIGTVWGAAIVIAALIGLLVKN-IPERvTCTTLMQAIGIAIVLLGV KMGLQTE 54 

Query: 61 SHLISNHTNMMIITLALGTVXiGELLSLDSYIDKFGOTLKQKTGSGNDIKFVEAFVTSTCT 120 

LI +1 +L +G V+GE+++L+ +D G +++ KG D AFVT+T 

Sbjct: 55 QFLI VICSLVIGGVIGEMINLEKRLDHLGRWIESKVGGKKDGSIATAFVTTTLI 108 

Query: 121 VCIGAMAWGSIQDGIAADHSILFAKGMLDMIIIAIMTVSLGKGALFSALPVALLQGSLT 180 

+GAMAV+G++ G+ DHS+L K +LD + + T +LG G LFSA+PV L QGS+ 
Sbjct: 109 YWGA^VLGALDSGLRGDHSVIIjTKALLIDGFIAILFTSTLGIGVI.FSAIPVVLYQGSIA 168 

Query: 181 IVAF FMGSLLNPSSLDYLNLVGNMLIFCTGVNLLFNLNIKVINMLPAIILAILWGS 236 

+ A ++ + LS+++G ++I +G+NLL +NI+V N+LP++++ + + 

Sbjct: 169 LFASQIDQYVPTALMDSFITEMSATGGVMIVAIGLNLLNVWIRVANLLPSLVIVAVLVT 228 

Query: 237 FI 238 
F+ 

Sbjct: 229 FV 230 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1466 

A DNA sequence (GBSxl552) was identified in S.agalactiae <SEQ ID 4503> which encodes the amino 
acid sequence <SEQ ID 4504>. This protein is predicted to be alanyl-tRNA synthetase (alaS). Analysis of 
this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -4.41 Transmembrane 805 - 821 ( 804 - 822) 



Final Results 

bacterial membrane Certainty=0. 2753 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04986 GB:AP001511 alanyl-tRNA synthetase [Bacillus halodurans] 
Identities = 482/885 (54%) , Positives = 618/885 (69%) , Gaps = 27/885 (3%) 

Query: 1 MKELSSAQIRQMWLDFWKSKGHSVEPSANLVPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

MK L+SAQ+RQM+LDF+K KGH VEPSA+LVP +DP+LLWINSGVATLKKYFDG VI PEN 
Sbjct: 1 MKYLTSAQVRQMFLDFFKEKGHDVEPSASLVPHDDPSLLWINSGVATLKKYFDGRVIPEN 60 

Query: 61 PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPEWFDF 120 

PRITNAQKSIRTNDIENVGKTARHHT FEMLGNFSIGDYF++EAIEW +E LTS +W F 
Sbjct: 61 PRITNAQKSIRTNDIENVGKTARHHTFFEMLGNFSIGDYFKEEAIEWAWEFLTSEKWIGF 120 



Query: 121 PKDKLYMTYYPDDKDSYNRWIA- CGVEPSHLVPIEDNFWE IGAGPSGPDTE I FFDRGEDF 179 
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K+KL +T +P+D ++Y+ W G+ ++ +E NFW+IG GPSGP+TEIF+DRG ++ 
Sbjct: 121 DKEKLSVTVHPEDDEAYSYWKEKIGIPEERIIRLEGNBTOIGEGPSGPNTEIFYDRGPEY 180 

Query: 180 DPENIGLRLLAEDIENDRYIEIWNIVLSQENADPAVPRSEYKELPNKNIDTGAGL 234 

5 DPE L ENDRY+E+WN+V SQFN +P Y LP KNIDTG GL 

Sbjct: 181 GDQPNDPE LYPGGENDRYLEVWNLVFSQENHNPD GSYTPLPKKNIDTGMGL 231 

Query: 235 ERLAAVMQGAKTNFETDLFMPIIREVEKLSGKTYDPDGD-NMSFKVIADHIRftLSFAIGD 293 
ER+ +V+Q TNFETDLFMPIIR EK+SG Y + ++SFKVIADHIR ++FAIGD 
10 Sbjct: 232 ERMVSVIQNVPTNFETDLFMPIIRATEKISGTEYGSHHEADVSFKVIADHIRTVTFAIGD 291 

Query: 294 GALPGNEGRGYVLRRLLRRAVMHGRRLGINETFLYKLVPTVGQIMESYYPEVLEKRDFIE 353 

GALP NEGRGYVLRRLLRRAV + +++GI+ F+Y+LVP VG IM +YPEV EK FI+ 
Sbjct: 292 GALPSNEGRGYVLRRLLRRAVRYAKQIGIDRPFMYELVPWGDIMVDFYPEVKEKAAFIQ 351 

15 

Query: 354 KIVKREEETFARTIDAGSGHLDSLLAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAE 413 

K+VK EEE F T++ G L+ ++ + K+EG T+ G D+F+LYDTYGFPV+LTEE E 
Sbjct: 352 KWKTEEERFHETLNEGLSILEKVIDKAKSEGASTISGSDVFRLYDTYGFPVDLTEEYVE 411 

20 Query: 414 DAGYKIDHEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRF-EYDTYSLESSL 472 

+ G ++D +GF++ M+ Q++RAR A + GSM +Q+E L I +S F Y S E+++ 
Sbjct: 412 EQGLQVDLDGFEAEMERQRERARTARQQAGSMQVQDEVLGQITVDSTFIGYKQLSTETTI 471 

Query: 473 SVIIADNERTFAVSEGQ-ALLVFAQTPFYAEMGGQVADHGVIKNDKGDTVAEWDVQKAP 531 
25 1+ D + V GQ A ++ +TPFYAE GGQVAD G+I+ G V V DVQKAP 

Sbjct: 472 ETIVLDKTVADYVGAGQFAKVILKETPFYAESGGQVADKGI IRGANGFAV- - VSDVQKAP 529 

Query: 532 NGQPLHTVNVL-ASLSVGTNYTLEINKERRLAVEKNHTATHLLHAALHNVIGEHATQAGS 590 
NGQ LHTV V +L V + + R + KNHTATHLLH AL +V+GEH QAGS 

30 Sbjct: 530 NGQHLHTVIVKEGTLQVNDQVQAIVEETERSGIVKNHTATHLLHRALKDVLGEHVNQAGS 589 

Query: 591 LNEEEFLRFDFTHFEAVSNEELRHIEQEV1TOQIWNDLTITTTETDVETAKEMGSMALFGE 650 

L EE LRFDF+HF V++EE IE+ VNE+IW + + + ++ AK +GAMALFGE 
Sbjct: 590 LVSEERLRFDFSHFGQVTDEEKEKIERIVlffiKIWQAIKVNISTRrLDEAKAIGAMALFGE 649 

35 

Query: 651 KYGKWRWQIGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAY 710 

KYG +VRW++G+YS+ELCGG H+ N+SEIGLFKIV E GIG+G RRI AVTG++AF 
Sbjct: 650 KYGDIVRWEVGDYSIELCGGCHVTNTSEIGLFKIVSESGIGAGVRRIEA.VTGKEAFLFM 709 

40 Query: 711 RNQEDALKEIAAWKAPQLKDAAAKVQALSDSLRDLQKENVELKEKARAAAAGDVFKDIQ 770 

Q D LICE AATVKA +KD +V+AL +R+LQ+EN L K AG + ++Q 

Sbjct: 710 AKQLDLLKETAATVKAKNVKDVPTOVEALQQQIRELQRENESLNAKLGNMEAGSLVNEVQ 769 

Query: 771 EAKGWFIASQVDVADAGALRTFADNWKQKDYSDVLVLVAAIGEKVNVLVASKTKDV- - - 827 
45 + +GV +A + AD LR+ D KQ+ S V+VL A KVN+ VA TKD+ 

Sbjct: 770 KIEGVPVLAKAISGADMDGLRSIVDKLKQEIPSWIVLGTASEGKVNI-VAGVTKDLINK 828 

Query: 828 --HAGISMIKGIAPIVAGRGGGKPDMAMAGGSDASKIAELLAAVAE 870 
HAG ++K +A G GGG+PDMA AGG K+ + L+ V E 
50 Sbjct: 829 GYHAGKLVKEVATRCGGGGGGRPDMAQAGGKQPEKLQDALSFVYE 873 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4505> which encodes the amino acid 
sequence <SEQ ID 4506>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
55 >>> Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -4.41 Transmembrane 805 - 821 ( 804 - 822) 

Final Results 

bacterial membrane Certainty=0. 2763 (Affirmative) < suco 

60 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



65 



Identities = 862/870 (99%) , Positives = 864/870 (99%) 
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Query: 1 MKELSSAQIRQMWLDFWKSKGHSVEPSM^VPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

MKELSSAQIRQMWLDFWKSKGH VEPSANLVPVNDPTLLWINSGVATLKKYFDGSVI PEN 
Sbjct: 1 MKELSSAQIRQMWLDFWKSKGHCVEPSANLVPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

Query: 61 PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPEWFDF 120 

PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSP+WFDF 
Sbjct: 61 PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPDWFDF 120 

Query: 121 PKDKLYMTYYPDDKDSYNRWIACGVEPSHLVPIEDNFWEIGAGPSGPDTEIFFDRGEDFD 180 

PKDKLYMTYYPDDKDSYNRWIACGVEPSHLVPIEDNFWEIGAGPSGPDTEIFFDRGEDFD 
Sbjct: 121 PKDKLYMTYYPDDKDSYNRWIACGVEPSHLVPIEDNFWEIGAGPSGPDTEIFFDRGEDFD 180 

Query: 181 PENIGLRLLAEDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPNKNIDTGAGLERLAAV 240 

PENIGLRLLAEDIFJTORYIEIVJNIVLSQFNADPAVPRSEYKELPNKNIDTGAGLERIAAV 
Sbjct: 181 PENIGLRLIAEDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPNKNIDTGAGLERLARV 240 

Query: 241 MQGAKTNFETDLFMPI1REVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 300 

MQGAKTNFETDLFMPIIREVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 
Sbjct: 241 MQGAKTNFETDLFMP1IREVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 300 

Query: 301 GRGYVLRRLLRRAVMHGRRLGINETFLYKLVPTVGQIMESYYPEVLEKRDFIEKIVKREE 360 

GRGYVLRRLLRRAVMHGRRLGINETFLYKLVPTVGQIMESYYPEVLEKRDFIEKIVKREE 
Sbjct: 301 GRGYVLRRLLRRAVMHGRRLGINETFLYKLVPTOGQIMESYYPEVLEKRDFIEKIVKREE 360 

Query: 361 ETFARTIDAGSGHLDSLLAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAEDAGYKID 420 

ETFARTIDAGSGHLDSLLAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAEDAGYKID 
Sbjct: 361 ETFARTIDAGSGHLDSLIAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAEDAGYKID 420 

Query: 421 HEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRFEYDTYSLESSLSVIIADNE 480 

HEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRFEYDTYSLESSIiSVI IADNE 
Sbjct: 421 HEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRFEYDTYSLESSLSVIIADNE 480 

Query: 481 RTEAVSEGQALLVFAQTPFYAEMGGQVADHGVIKNDKGDTVAEVVDVQKAPNGQPLHTVN 540 

RTEAVSEGQALLVFAQTPFYAEMGGQVAD G IKNDKGDTVREWDVQKAPNGQPLHTVN 
Sbjct: 481 RTEAVSEGQALLVFAQTPFYAEMGGQVADTGRIKISroKGDTVAEVVDVQKAPNGQPLHTVN 540 

Query: 541 VLASLSVGTNYTLEINKERRLAVEKNHTATHLLHAALHNVIGEHATQAGSLNEEEFLRFD 600 

VIASLSVGTNYTLEINKERRLAVEKNHTATHLLHAALHNVIGEHATQAGSLNEEEFLRFD 
Sbjct: 541 VIASLSVGTl^TLEINKERRIAVEKiraTATHLLHAALHWIGEHATQAGSUffiEEFLRFD 600 

Query: 601 FTHFFAVSNEELRHIEQEVNEQIV^LTITTTETDVETAKEMGAMALFGEKYGKVVRVVQ 660 

FTHFEAVSNEELRHIEQEVNEQIWN LTITTTETDVETAKEMGAMALFGEKYGKWRWQ 
Sbjct: 601 FTHFFAVSNEELRHIEQEVNEQIWNALTITTTETDVETAKEMGAMALFGEKYGKVVRVVQ 660 

Query: 661 IGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAYRNQEDALKEI 720 

IGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAYRNQEDALKEI 
Sbjct: 661 IGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAYRNQEDALKEI 720 

Query: 721 AATVKAPQLKDAAAKVQALSDSLRDLQKENVELKEKAAAAAAGDVFKDIQEAKGVRFIAS 780 

AATVKAPQLKDAAAKVQALSDSLRDLQKEN ELKEKAAAAAAGDVFKD+QEAKGVRFIAS 
Sbjct: 721 AATVKAPQLKDAAAKVQALSDSLRDLQKENAELKEKAAAAAAGDVFKDVQEAKGVRFIAS 780 

Query: 781 QVDVADAGALRTFADNWKQKDYSDVLVLVAAIGEKTOVLVASKTKDVHAGNMIKGLAPIV 840 

QVDVADAGALRTFADNWKQKDYSDVLVLVAAIGEKVNVLVASKTKDVHAGNMIK LAPIV 
Sbjct: 781 QVDVADAGALRTFADNWKQKDYSDVLVLVAAIGEKVNVLVASKTKDVHAGNMIKELAPIV 840 

Query: 841 AGRGGGKPDMAMAGGSDASKIAELLAAVAE 870 

AGRGGGKPDMAMAGGSDASKIAELLAAVAE 
Sbjct: 841 AGRGGGKPDMAMAGGSDASKIAELLAAVAE 870 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens 
vaccines or diagnostics. 
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Example 1467 

A DNA sequence (GBSxl553) was identified in S.agalactiae <SEQ ID 4507> which encodes the amino 
acid sequence <SEQ ID 4508>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
5 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2974 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9747> which encodes amino acid sequence <SEQ ID 9748> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB15920 GB:Z99123 yxjl [Bacillus subtilis] 

Identities = 42/144 (29%) , Positives = 73/144 (50%) , Gaps = 2/144 (1%) 

Query: 17 IKEKMFSLGGKFTITDLTGLPCYHVEGSLFPLPKTFKVFDEEEHLISQIEKKVLSFLPKF 76 
+K+KMFS FID + VEG F L + ++ D + IE+K++S LP++ 

20 , Sbjct: 6 MKQKMFSFKDAFHIYDRDEQETFKVEGRFFSLGDSLQMTDSSGKTLVSIEQKLMSLLPRY 65 

Query: 77 NVTLANGNHFT I KKDFS FLKPHYTI EDLDMEVRGNFWDMDFQLLKDNQVIANI SQQWFRM 136 

+++ + K +F KP + I L+ E+ G+ W +FQL V ++S++W 

Sbjct: 66 EISIGGKWCEVTKKVTFSKPKFVISGLNWEIDGDLVn^EFQLTDGENVRMSVSKKWLSW 125 



25 



Query: 137 TSTYQVEVYSETYNDLT I SLVIAI 160 

+Y +++ E D+ I IAI 
Sbjct: 126 GDSYHLQIAYE- -EDVLICTAIAI 147 



30 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1468 

A DNA sequence (GBSxl554) was identified in S.agalactiae <SEQ ID 4509> which encodes the amino 
35 acid sequence <SEQ ID 4510>. Analysis of this protein sequence reveals the following: 

■ Possible site: 30 

>» Seems to have no N-terminal signal sequence ■ ■ 

Final Results 

40 bacterial cytoplasm Certainty=0. 3 833 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:BAA36674 GB:AB016282 0RF17 [bacteriophage phi-105] 

Identities = 45/133 (33%) , Positives = 74/133 (54%) , Gaps = 5/133 (3%) 

Query: 2 KYTYIALFEVtlKENGGYNISFPDFHGAFSEADSLNEaiFNAREVLEIYTIMFEDEGKEFP 61 
+Y Y ALF+ D + G ++FPD G + +S EA+ A+E + ++ FE +G P 
50 Sbjct: 5 RYIYPALFDYDDD--GITOTFPDLPGCITFGNSGGEALTMAKEAMALHLYGFEQDGDIIP 62 

Query: 62 KASSFKALASNIASDEDVIQAISVBTELWERERSKIvNKTvTLPSl^vEVGKENKVNFS 121 

+A+ K + A + + I R + V KT+T+P W+ ++ KE+KVN+S 

Sbjct: 63 EATPSKEIK AEESQSVVIjIETM^PPFRHDMENAAvTCKTLTIPRWMDDIAKEHKVNYS 119 

55 
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Query: 122 QLLQKAIREELQV 134 

QLLQ+AI+E L + 
Sbjct: 120 QLLQEAIKEHLGI 132 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1469 

A DNA sequence (GBSxl555) was identified in S.agalactiae <SEQ ID 451 1> which encodes the amino 
acid sequence <SEQ ID 4512>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA25696 GB:AB010712 NADH oxidase/alkyl hydroperoxidase 
reductase [Streptococcus mutans] 
Identities = 383/509 (75%) , Positives = 441/509 (86%) 



Query: 


1 


MVLDKEIKAQLAQYLDLLESDIVLQADLGDNDNSQKVKDFLDEIVAMSDRISLESTHLKR 


60 






M LD EIK QL QYL LLES+IVLQA L D+ NSQKVK+FL EIVAMS ISLE L R 




Sb j ct : 


1 


MALDAEIKEQLGQYLQLLESEIVLQAQLKDDANSQKVKEFLQEiVAMSPMISLEEKELPR 


60 


Query: 


61 


QPSFGIAKKGHESRVIFSGLPMGHEFTSFILALLQVSGRAPICVDEDIIKRIKGIEKTINL 


120 






PSF IAKKG ES V F+GLP+GHEFTSFILALLQVSGR PKV+ DI+KRI+ +++ ++ 




Sb j ct : 


61 


TPSFRIAKKGQESGVEFAGLPLGHEFTSFILALLQVSGRPPKVETDIVKRIQAVDEPMHF 


120 


Query: 


121 


ETWSLTCHNCPDWQAFNI^VLNPNITHTMIEGGMYQDEVKSKGIMSVPTVYKDQEEF 


180 






ETYVSLTCHNCPDWQAFNIM+V+NPNI+HTM+EGGM++DE+++KGIMSVPTVYKD EF 




Sb j ct : 


121 


ETWSLTCHWCPDWQAFNIMSVVNPNISHTMVEGGMFKDEIEAKGIMSVPTVYKDGTEF 


180 


Query: 


181 


TSGRATIEQLLEQLDGPLDAEAFADKGVYDVLVIGGGPAGNSAAIYAARKGLKTGILAET 


240 






TSGRA+IEQLL+ + GPL +AF DKGV+DVLVIGGGPAGNSAA.IYAARKG+KTG+LAET 




Sbjct: 


181 


TSGRASIEQLLDLIAGPLKEDAFDDKGVFDVLVIGGGPAGNSAA.IYAARKGVKTGLLAET 


240 


Query: 


241 


FGGQVIETVGIENMIGTLYTEGPKLMAQIEEHTKSYDIDIIKSQLATGIEKKELVEVTLA 


300 






GGQV+ETVGIENMIGT Y EGP+LMAQ+EEHTKSY +DI+K+ A I+K +LVEV L 




Sb j ct : 


241 


MGGQVMETVGIENMIGTPYVEGPQLMAQVEEHTKSYSVDIMKAPRAKSIQKTDLVEVELD 


300 


Query: 


301 


NGAILQAKTAIIiALGAKWRNINVPGEEEFRNKGVTYCPHCDGPLFEGKDVAVIGGGNSGM 


360 






NGA L+AKTA+LALGAKWR INVPGE+EF NKGVTYCPHCDGPLF K VAVIGGGNSG+ 




Sb j ct : 


301 


NGAHLKAKTAVLALGAKWRKINVPGEKEFFNKGWYCPHCDGPLFTDKKVAVIGGGNSGL 


360 


Query: 


361 


FAALDLAGVTKHVTVLEFLPELKADQVLQERAAKTDNLTILKNVAT 


420 






EAR+DLAG+ HV +LEFLPELKAD++LQ+RA DN+TIL NVATK+I+G DHV GL Y 




Sbjct: 


361 


FAAIDLAGLASHVYILEFLPELICADKILQDRAEALDNITILTNVATKEIIGNDHVEGLRY 


420 


Query: 


421 


TDRDTNEEKHIDLEGVFVQIGLVPSTSWLKDSGIELNERQEIVVDKFGSTNIPGIFAAGD 


480 






+DR TNEE +DLEGVFVQIGLVPST WLKDSG+ LNE+ EI+V K G+TNIP IFAAGD 




Sb j ct : 


421 


SDRTTNEEYLLDLEGVFVQIGLVPSTDWLKDSGLALNEKGE 1 1 VAKDGATNI PAI FAAGD 


480 


Query: 


481 


CTDAAYKQI I I SMGSGATAAIGAFDYLIR 509 








CTD+AYKQI I ISMGSGATAA+GAFDYLIR 




Sbj ct : 


481 


CTDSAYKQI I ISMGSGATAALGAFDYLIR 509 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4513> which encodes the amino acid 
sequence <SEQ ID 4514>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0S54 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 419/510 (82%) , Positives = 472/510 (92%) 



Query: 


1 


MVLDKEIKAQLAQYLDLLESDIVLQADLGDNDNSQKVKDFLDEIVAMSDRISLESTHLKR 


60 






M L +IK QLAQYL LLE+D+VLQ LGDN+ SQKVKDF+ +E I AMS+RIS+E+ L R 




ffhn r>f- ' 

OJJJ LL . 


3_ 


MAT.<3PnTT070T,AOYT,TT,T,T?AnT.VT,nV.9T.fini<n?nfl 

i'iwi .Ti-i ii n p.i^ji iwij lux 1 1 1 irifmii vuyv L3xjui_/j.\.cji ) ^kj^i\. v r^LJC v n rij.flm'ji3ar^.LO j-£-ii* j. x j-iiyiv 


60 


Query: 


61 


QPSFGIAKKGHESRVIFSGLPMGHEFTSFILALLQVSGRAPKVDEDIIKRIKGIEKTINL 


120 






QPSF +AKKGH S V+F+GLP+GHE TSFILALLQVSGRAPKVD+D+I RIK I++ ++ 




Sb j ct : 


61 


OPSFKVRKKGHOSGWFAOtiPLGHETiTSFIIjAIjLOVSGRAPKVDODVIDRIKAIDRPLHF 


120 


Query: 


121 


ETWSLTCHNCPDWQAFNII^VLNPNITHTMIEGGMYQDEVKSKGIMSVPTVYKDQEEF 


180 






ETYVSLTCHNCPDWQA NIM4-VLN I+HTM+EGGM+QDEVK+KGIMSVPTV+ D EEF 




Sb j ct : 


121 


PTVV^T.TrHNrPnVVnAT.ftTTMSVT.r^ 

Cj J. J. VuJJl \— 111.N V_.ITJ— ' V V Vj-fli-l L\ J. 1*11-3 V J_lLN U I\. J. Uli 1 1"! V J-jV_3VJl*ir \^t-Jt—l V I \-n IVVJ Xi'Ju V t J. V J. ±JLJ\J 1 'i f 'i J. 


180 


Query : 


181 


TSGRATIEOLLEOLDGPLDAEAFADKGVYDVIiVIGGGPAGNSAAIYAARKGLKTGILAET 


240 






TSGRATIEQLLEQ+ GPL FAFADKG+YDVLVIGGGPAGNSAAIYAARKGLKTG+LAET 




Sbjct: 


181 


TSGRATIEQLLEQIAGPLSEEftJ'ADKGLYDvLVIGGGPAGNSAAIYAARRGLKTGLLAET 


240 


Query: 


241 


FGGQVIETOGIENMIGTLYTEGPKLMftQIEEHTKSYDIDIIKSQLATGIEKKEIjVEOT 


300 






FGGQV+ETVGIENMIGTLYTEGPKLMA++E HTKSYD+DI IK+QLAT IEKKE +EVTLA 




Sb j ct : 


241 


KSGQWffiTVGIENMIGTLYTEGPKLK^ 


300 


Query: 


301 


NGAILQAKTAIIxaiGAKWRNINVPGEEEFRNKGVTYCPHCDGPLFEGKDVAVIGGGNSGM 


360 






NGA+LQAKTAILALGAKWRNINVPGE+EFRNKGVTYCPHCDGPLFEGKDVAVIGGGNSG+ 




Sbjct: 


301 


NGAvLQAKTAIIiAiGAKWRNINVPGEDEFRNKGvTYCPHCDGPLFEGKDVAVIGGGNSGL 


360 


Query: 


361 


EfiJ^DLAGOTKHVT^EFLPELKADQVLQERAAKTDNLTILKOTATKDx 


420 






EAALDLAG+ KHV VLEFLPELKAD+VLQ+RAAKT+N+TI+KNVATKDIVGEDHVTGLNY 




Sbjct: 


361 


FJ^DLAGLAKHVYVLEFLPELKADKyLQDRAAKTNNMTI I KNVATKDIVGEDHVTGLNY 


420 


Query: 


421 


TDRDTNEEKHIDLEGVFVQIGLVPSTSWLKDSGIELNERQEIVVDKFGSTNIPGIFAAGD 


480 






T+RD+ E+KH+DLEGVFVQIGLVP+T+WLKDSG+ L +R EI+VDK GSTNIPGIFAAGD 




Sb j ct : 


421 


TERDSGEDKHLDLEGVFVQIGLVPNTAWLKDSGVNLTDRGEIIVDKHGSTNIPGIFAAGD 


480 


Query: 


481 


CTDAAYKQIIISMGSGATAAIGAFDYLIRQ 510 








CTD+AYKQIIISMGSGATAAIGAFDYIilRQ 




Sb j ct : 


481 


CTDSAYKQIIISMGSGATAAIGAFDYLIRQ 510 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1470 

A DNA sequence (GBSxl556) was identified in S.agalactiae <SEQ ID 4515> which encodes the amino 
acid sequence <SEQ ID 4516>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2906 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA25695 GB:AB010712 alkyl hydroperoxidase [Streptococcus mutans] 
Identities = 167/186 (89%) , Positives = 179/186 (95%) 

Query: 1 MSLVGKEIIEFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 60 

MSLVGKE++EFSAQAYH G+F+TV NEDVKGKWAVFCFYPADFSFVCPTELGDLQEQY T 
Sbjct: 1 MSLVGKEMVEFSAQAYHQGEFVTVNNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYAT 60 

Query: 61 LKSLDVEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQGFDVLGQDGLAQRG 120 

L+SL VEVYSVSTDTHFVHKAWHDDSDWGTITY MIGDPSH++SQGF+VLG+DGLAQRG 
Sbjct: 61 LQSLGvEVYSVSTDTHFVHKAWHDDSDWGTITYTMIGDPSHVLSQGFE^7LGEDGIlAQRG 120 

15 Query: 121 TFIIDPDGVIQMMEINADGIGRDASTLIDKVRAAQYIRQHTGEVCPAKWKEGAETLTPSL 180 

TFI+DPDG+IQMME+NADGIGRDASTLIDKVRAA.QYIRQH GEVCPAKWKEGAETL PSL 
Sbjct: 121 TFIVDPDGIIQMMEWADGIGRDASTLIDKVRAAQYIRQHPGEVCPAKWKEGAETLKPSL 180 

Query: 181 DLVGKI 186 
20 DLVGKI 

Sbjct: 181 DLVGKI 186 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4517> which encodes the amino acid 

sequence <SEQ ID 4518>. Analysis of this protein sequence reveals the following: 

25 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results --■ r — 

bacterial cytoplasm Certainty=0. 3022 (Affirmative) < suco 

30 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/186 (93%) , Positives = 181/186 (97%) 

35 

Query: 1 MSLVGKEIIEFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 60 

MSL+GKEI EFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 
Sbjct: 1 MSLIGKEIAEFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 60 

40 Query: 61 LKSLDVEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQGFDVLGQDGLAQRG 120 

LKSL VEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQ F+VLG+DGLAQRG 
Sbjct: 61 LKSLGVEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQAFEVLGEDGLAQRG 120 

Query: 121 TFIIDPDGVIQMMEINADGIGRDASTLIDKVRAAQYIRQHTGEVCPAKWKEGAETLTPSL 180 
45 TFI+DPDG+ IQMMEINADGIGRDASTLIDK+ AAQY+R+H GEVCPAKWKEGAETLTPSL 

■ Sbjct: 121 TFIVDPDGIIQMMEINADGIGRDASTLIDKIHAAQYVRKHPGEVCPAKWKEGAETLTPSL 180 

Query: 181 DLVGKI 186 
DLVGKI 

50 Sbjct: 181 DLVGKI 186 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1471 

55 A DNA sequence (GBSxl557) was identified in S.agalactiae <SEQ ID 4519> which encodes the amino 
acid sequence <SEQ ID 4520>. This protein is predicted to be 30S ribosomal protein S2 (rpsB). Analysis of 
this protein sequence reveals the following: 

Possible site: 60 
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>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4462 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA50276 GB:X70925 30S ribosomal protein [Pediococcus 
acidilactici] 

Identities = 190/260 (73%) , Positives = 226/260 (86%) , Gaps = 4/260 (1%) 

Query: 1 MAVI SMKQLLEAGVHFGHQTRRWNPKMAKYI FTERNGIHVT DLQQTVKLADQAYEFVRDA 60 

M+VISMKQLLEAGVHFGHQTRRWNPKM +IFTERNGI++IDLQ+TVKL D AY FV+D 
Sbjct: 1 MSVISMKQLLEAGVHFGHQTRRWSIPKMKPFIFTERNGIYIIDLQKTVKLIDNAYNFVKDV 60 

Query: 61 AANDAVILFVGTKKQAAEAVAEEAKRAGQYFINHRWLGGTLTNWGTIQKRIARLKEIKRM 120 

AAND V+LFVGTKKQA A+ EEAKRAGQ+++NHRWLGGTLTNW TIQKRI RLK++K+M 
Sbjct: 61 AANDGVVLFVGTKKQAQTAIEEEAKRAGQFYA/NHRWLGGTLTNWNTIQKRIKRLKDLKKM 120 

Query: 121 EEEGTFELLPKKEVALLNKQRARLEKFLGGIEDMPRIPDVMYVVDPHKEQIAVKEAKKLG 180 

EE+GTF+ LPKKEVALI1NKQ+ +LEKFLGGIEDMP IPDV++WDP KEQIA+KEA+KL 
Sbjct: 121 EEDGTFDRLPKKEVALLNKQKDKLEKFLGGIEDMPHIPDVLFVVDPRKEQIAIKEAQKLN 180 

Query: 181 IPWAMVDTNADPDDIDVI1PANDDAIRAVKLITSKLADAVIEGRQGEDADV DFAQ 236 

IPWAMVDTN DPD +DVIIP+NDDAIRAV+LITSK+ADAV+EGRQGED + + A+ 

Sbjct: 181 IPWAMVDTNTDPDQVDVIIPSNDDAIRAVRLITSKMADAWEGRQGEDDEAVQQEEVAE 240 

Query: 237 EAQADSIEEIVEWEGSNND 256 

DS+E++ + VE +N+ 
Sbjct: 241 GVSKDSLEDLKKTVEEGSNE 260 

A related DNA sequence was identified in S.pyogenes <SEQ ID 452 1> which encodes the amino acid 
sequence <SEQ ID 4522>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 241/254 (94%) , Positives = 248/254 (96%) 

Query: 1 MAVISMKQLLEAGTOFGHQTPJ^WNPKMAKYIFTERNGIHVIDLQQTVTCLADQAYEFVRDA 60 

MAVISMKQLLEAGVHFGHQTPJRWNPKMAKYIFTERNGIHVIDLQQWKLADQAYEFVRDA 
Sbjct: 1 MAVISMKQLLFAGVHFGHQTFJlWNPKMAiCflFTERNGIHVIDLQQTvXLADQAYE 60 

Query: 61 AANDAVILFVGTKKQAAFAVAEEAKRAGQYFINHRWLGGTLTNWGTIQKRIARLKEIKRM 120 

AANDAVI LFVGTKKQAAEAVA+EA RAGQYFINHRWLGGTLTNWGTIQKRIARLKEIKRM 
Sbjct,: 61 AANDAVI LFVGTKKQAAEAVADEATRAGQYFINHRWLGGTLTNWGTIQKRIARLKEIKRM 120 

Query: 121 EEEGTFELLPKKEVALI^KQRARLEKFLGGIEDMPRIPDVMYVVDPHKEQIAVKEAKKLG 180 

EEEGTF++LPKKEVALIiNKQFJ^LEKFLGGIEDMPRIPDVMYVVDPHKEQIAVKEAKKLG 
Sbjct: 121 EEEGTFDVLPKKEVALLNKQRARLEKFLGGIEDMPRIPDVMYVVDPHKEQIAVKEAKKLG 180 

Query: 181 IPWAMVnTNADPDDIDVIIPANDDAIRAVKLITSKLADAVIEGRQGEDADVDFAQEAQA 240 

IPWAMVDTNADPDDID+IIPANDDAIRAVTajIT+KIiADA+IEGRQGEDADV F + QA 
Sbjct: 181 IPWAMVDTNADPDDIDIIIPANDDAIRAVKLITAKIADAIIEGRQGEDADVAFEADTQA 240 

Query: 241 DS IEEIVEWEGSN 254 

DSIEEIVEWEG N 
Sbjct: 241 DSIEEIVEWEGDN 254 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1472 

5 A DNA sequence (GBSxl558) was identified in S.agalactiae <SEQ ID 4523> which encodes the amino 
acid sequence <SEQ ID 4524>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2648 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73435 GB:AL139077 elongation factor TS [Campylobacter jejuni] 
Identities = 169/358 (47%) , Positives = 226/358 (62%) , Gaps = 19/358 (5%) 

Query: 1 MAEITAKLVKELREKSGAGVMDAKKALVETrXSDLDKAIELLREKGmKAAKKADRVAAEG 60 
20 M EITA +VKELRE +GAG+MD K AL ET+GD DKA++LLREKG+ KAAKKADR+AAEG 

Sbjct: 1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRtAAEG 60 

Query: 61 LTGVYV- - DGNVAAVIETOAETDFVAKNDQFVTLVNETAKVIAEGRPSNNEEALALTMPS 118 
L V V D A V E+N+ETDFVAKNDQF+ L +T I + EE + T+ + 

25 Sbjct: 61 LVSVKVSDDFTSATVSEINSETDWAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTI-N 119 

Query: 119 GETLEQAFVTATATIGEKISFRRFALVEKTDEQHFGAYQHNGGRIGVITV VEG 171 

G E+ + ATIGE + RRFA ++ Y H GR+GV+ V 

Sbjct: 120 GVKFEEYLKSQIATIGENLVvRRFATIiKfiGaNGVVNGYIHTNGRVGVVIAAACDSAEVAS 179 

Query: 172 GDDAIAKQVSMHVAAMKPTVLSYTELDAQFVHDELAQIiNHKIEQDNESRAMV NKPAL 228 

L +Q+ MH+AAM+P+ LSY +LD FV +E L ++E++NE R + NKP 
Sbjct: 180 KSRDLLRQICMHIAAMRPSYIjSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEH 239 

35 Query: 229 PFLKYGSKAQLTDEVIAQAEEDIKAELAAEGKPEKIWDKIVPGKMDRFMLDNTKVDQEYT 288 

++ S+ QL+D ++ +AEE IK EL A+GKPEKIWD I+PGKM+ F+ DN+++D + T 
Sbjct: 240 KIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQIjDSKLT 299 

Query: 289 LLAQVYIMDDSKTVEAYLESV NAKAVAFVRFEVGEGIEKASNDFEAEVAATM 340 

40 L+ Q Y+MDD KTVE + K V F+ FEVGEG+EK + DF AEVAA + 

Sbjct: 300 LMGQFYVMDDKKTvEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL 357 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4525> which encodes the amino acid 
sequence <SEQ ID 4526>. Analysis of this protein sequence reveals the following: 

45 Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3942 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 307/344 (89%) , Positives = 327/344 (94%) 



30 



55 



Query: 1 MAEITAKLVKELREKSGAGVTyTOAKKALvETDGDLDKAIELLREKGMAKAAKKADRV 60 

MAEITAKLVlCELREKSGAGVMDAKKALvETIX3D+D 
Sbjct: 33 I^ITAKLVKELREKSGAGVMDAKKALTOTDGD^ 92 
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Query: 61 LTGVYVDGNVAAVIEWAETDEVAKIOTQEVrLV]ffiTAKVIAEGRPSNNEEAIiALTMPSGE 120 

LTGVYV GNVAAV+EVNAETDFVAKN QFV LVN TAKVIAEG+P+NN+EALAL MPSGE 
Sbjct: 93 LTGVYvHGNVAAWEVNAETDFVAKNAQFvELVNATAKVIAEGK^ 152 

5 Query: 121 TLEQAFVTATATIGEKISFRRFALVEKTDEQHFGAYQHNGGRIGVITVVEGGDDALiAKQV 180 

TL +A+V ATATIGEKISFRRFAL+EK DEQHFGAYQHNGGRI GVI + VVEGGDDALAKQV 
Sbjct: 153 TLAEAYVNATATIGEKISFRRFALIEKADEQHFGAYQHNGGRIGVISWEGGDDALAKQV 212 

Query: 181 SMHVAAMKPTVLSYTELDAQFVHDEIAQENHKIEQDNESRAMVNKPALPFLKYGSKAQLT 240 
10 SMH+AAMKPTVLSYTELDAQF+ DELAQLNH IE DNESRAMV+KPALPFLKYGSKAQL+ 

Sbjct: 213 SMHIAAMKPTVLSYTELDAQFIKDELAQLNHAIELDNESRAMVDKPALPFLKYGSKAQLS 272 

Query: 241 DEVIAQAEEDIKAELAAEGKPEKIWDK1VPGKMDRFMLDNTKVDQEYTLLAQVYIMDDSK 300 
D+VI AE DIKAEIAAEGKPEKIWDKI + PGKMDRFMLDNTKVDQ YTLLAQVYIMDDSK 
15 Sbjct: 273 DDVITAAEADIKAELAAEGKPEKIWDKIIPGKMDRFMLDNTKVDQAYTLLAQVYIMDDSK 332 

Query: 301 TVEAYLESVNAKAVAFVRFEVGEGIEKASNDFEAEVAATMAAAL 344 

TVEAYL+SVNAKA+AF RFEVGEGIEK +NDFE+EVAATMAAAL 
Sbjct: 333 TVEAYLDSVNAKAIAFARFEVGEGIEKKANDFESEVAATMAAAL 376 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1473 

A DNA sequence (GBSxl559) was identified in S.agalactiae <SEQ ID 4527> which encodes the amino 
25 acid sequence <SEQ ID 4528>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm — Certainty=0. 1312 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1474 

A DNA sequence (GBSxl560) was identified in S.agalactiae <SEQ ID 4529> which encodes the amino 
40 acid sequence <SEQ ID 4530>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.86 Transmembrane 128 - 144 ( 124 - 152) 
INTEGRAL Likelihood = -4.57 Transmembrane 35 - 51 ( 33 - 53) 
45 INTEGRAL Likelihood = -4.04 Transmembrane 92 - 108 ( 87 - 111) 

Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04953 GB:AP001511 small multidrug export related protein 
[Bacillus halodurans] 
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Identities = 47/137 (34%) , Positives = 71/137 (51%) , Gaps = 5/137 (3%) 

Query: 12 I PLVELRGAVPFAIANGIPLWFALAIG WGNMLP VPI I FFFARKVLEWGADKPYTGKFFT 71 

+P+VELRG +P + G+ WEAL G++GN+LP+ I R + W + + + 

Sbjct: 1 MPIVELRGGIPLGVVLGLSPWEALLFGIIGNLLPIVTPILLLFRPISGWMLRFKWYQRLYD 60 

Query: 72 WCLKKGHSGGQKLEKVAGEKGLFIALLLWGIPLPGTGAWTGTLAASLLDWEFKHSVIAV 131 

W + +EK I L+LF +PLP TGA++ LAA L F+ + AV 

Sbjct: 61 WLYNRTMKKSNNVEKFGA IGLILFTAVPLPTTGAYSACLAAVLFFI PFRFAFFAV 115 

Query: 132 MLGVTLAGCIMGTLSII 148 

GV++AG +M SI 
Sbjct: 116 SAGWIAGIVMTLFSYI 132 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8817> and protein <SEQ ID 8818> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 3.98 

GvH: Signal Score (-7.5): -2.35 
Possible site: 26 

>>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 3 value: -7.86 threshold: 0.0 

INTEGRAL Likelihood = -7.86 Transmembrane 128 - 144 ( 124 - 152) 
INTEGRAL Likelihood = -4.57 Transmembrane 35 - 51 ( 33 - 53) 
INTEGRAL Likelihood = -4.04 Transmembrane 92 - 108 ( 87 - 111) 
PERIPHERAL Likelihood = 12.20 109 
modified ALOM score: 2.07 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 4142 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

LPXTG motif: 105-109 

The protein has homology with the following sequences in the databases: 

186 216 246 276 306 336 366 396 

LTIIISNF*KIRK*NLSKDSKTRMTADFSCHY*KDKIKWNNTIERFYLMNYIITFLISMIPLVELRGAVPFAIANGIPLW 

=1= llll|:|:|: I 
MPFSELRGAI PLALYFGFSPA 
10 20 

• 426 456 486 516 546 576 591 621 

EALAIGWGNMLPVPIIFFFARKVLEWGADKPYTGKFFTWCLKKGHSGGQKLEKVAGEKGL FIALLLFVGIPLPG 

|| : |:||:|||| :::| :: : : : :|>| ||: == I :|| 

EAYLLSVLGNILPVPFLLLFLDYLVRIATKVELLARIYR-r RWERVERRKGWERYGYLGLTIFVAIPLPV 

40 50 60 70 80 90 

651 681 711 741 771 801 831 861 

TGAVTOGTIAASLLDWEFKHSVIAVMLGVILAGCIMGTLSIIGFNLF*KS*GEMTVSPF*YLPIHQFDSKIRHLT*AKCLI 

llllllll III = = = II =11 == II h 

TGAWTGTLLAFLLQLNRLKAFLFISAGVCIAGVVVIiLASIGIIRLL 
110 120 130 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1475 

A DNA sequence (GBSxl561) was identified in S.agalactiae <SEQ ID 4531> which encodes the amino 
acid sequence <SEQ ID 4532>. This protein is predicted to be CtsR protein (ctsR). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3672 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB91548 GB:AJ249133 CtsR protein [Lactococcus lactis] 
15 , Identities = 74/146 (50%) , Positives = 103/146 (69%) , Gaps = 3/146 (2%) 

Query: 4 KNTSDNIEEYIKSLLEQSG1AEIKRSNLADTFQWPSQINYVIKTRFTESRGYVVESKRG 63 

KNTSD IE Y++ LLE++ + EIKR++LA+ F WPSQINYVT KTRFT S+G+ VESKRG 
Sbjct: 5 KNTSDIIEAYLRQLLEEAQVIEIKRADLANQFDWPSQINYVIKTRFTASKGFDVESKRG 64 

20 

Query: 64 GGGYIRIAKVHFSDQHQLFGNMLSTIGERISEQVFDDLIQLLFDEEIITEREGNLILATS 123 

GGGYI+I K +S +H+ + + +S + D++QLLFDE+++TEREGNL+L 
Sbjct: 65 GGG YIKI VKYQ YSARHEFLTALYQKVPANL S S KAAHD I VQLIiFDEKVLTEREGNLLLLVI 124 

25 Query: 124 GDDVLGEQASVIRARMLRKLLQRLDR 149 
D G + R M++ ++ RLDR 
Sbjct: 125 TD GAISPFTRGIMMKSIINRLDR 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4533> which encodes the amino acid 
30 sequence <SEQ ID 4534>. Analysis of this protein sequence reveals the following: 
Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 2514 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 117/151 (77%), Positives = 131/151 (86%) 

Query: 1 miKNTSDNIEEYIKSLLEQSGIAEIKRSNI^TFQWPSQIJ^IKTRFTESRGYVVES 60 

M KNTSD+IEEYIK LL +SGIAEIKRS LAD+FQWPSQINYVIKTRFTESRGY VES 
Sbjct: 1 MPTKNTSDSIEEYIKELIAKSGIAEIKRSMLADSFQVVPSQINYVIKTRFTESRGYEVES 60 

45 

Query: 61 KRGGGGYIRIAKVHFSDQHQLFGNMLSTIGERISEQVFDDLIQLLFDEEIITEREGNLIL 120 

KRGGGGYIRIAKVHFSD+H L GN+++TI + ISEQVF D IQLLFDE ++TEREGN+IL 
Sbjct: 61 KRGGGGYIRIAKVHFSDKHHLIGNLMATIEDCISEQVFTDSIQLLFDEHLLTEREGNIIL 120 

50 Query: 121 ATSGDDVLGEQASVIRARMLRKLLQRLDRKG 151 

A + DDVLG S IRARML +LLQR+DRKG 
Sbjct: 121 AVASDDVLGTDGSTIRARMLYRLLQRIDRRG 151 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1476 

A DNA sequence (GBSxl562) was identified in S.agalactiae <SEQ ID 4535> which encodes the amino 
acid sequence <SEQ ID 4536>. This protein is predicted to be ClpC (clpB-1). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.34 Transmembrane 32 - 48 ( 32 - 49) 



Final Results 

10 bacterial membrane Certainty=0. 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAD01783 GB:AF023422 ClpC [Lactococcus lactis] 

Identities = 401/831 (48%) , Positives = 571/831 (68%) , Gaps = 52/831 (6%) 

Query: 4 YSIKLQEVFRIAQFQAARYESHYLESWHLLLAMVLVHDSVAGLTFAEYE SEVAIEEY 60 

Y+ L +F A A +Y+ +ES HLL AM S+A A S++ 1 + 

20 Sbjct: 8 YTPTLDRIFEKAAEYAHQYQYGTIESAHLIAAMATTSGSIAYSILAGMNVDSSDLLIDLE 67 

Query: 61 EAATIIALGRAPKEEITNYQFLEQSPALKKILKLAENISIWGAEDVGTEHVLLRMLVNK 120 

+ ++ + + R+ L, SP ++++ +A +++ AE VGTEH+L A+L + 

Sbjct: 68 DLSSHVKVKRSE LRFSPRAEEWTVASFLAVHNNAEAVGTEHLLYALLQVE 118 

25 

Query: 121 DLIATRILELVGFRGQDDGESVRMVDLRKALERHAGF-TKDDIKAIYELRNPKKAKSGAS 179 

D ++L+L + + +V LRK +E+ G ++ KA+ + K AK A 

Sbjct: 119 DGFGLQLLKL QKINIVSLRKEIEKRTGLIVPENKKAVTPMSKRKMAKGVAE 169 

30 Query: 180 FSDMMKPPSTAGDLADFTRDLSQMAVDGEIEPVIGRDKEISRMVQVLSRKTKNNPVLVGD 239 

S+ L + DL++ A G+++P+IGR+ E+ R++ +LSR+TKNNPVLVG+ 
Sbjct: 170 NSSTPTLDSVSSDLTEAARSGKLDPMIGREAEVDRLIHILSRRTKNNPVLVGE 222 

Query: 240 AGVGKTALAYGLAQRIANGNIPYELRDMRvLELDMMSvVAGTRFRGDFEERMNQIIADIE 299 
35 GVGK+A+ GLAQRI NG +P L + R++ L+M +WAGT+FRG+FE+R+ 1+ ++ 

Sbjct: 223 PGVGKSAIIEGIiAQRIWGQVPIGLMNSRIMALNMATWAGTKFRGEFEDRLTAIVEEVS 282 

Query: 300 EDGHIILFIDELHTIMGSGSGIDSTLDAANILKPALARGTLRTVGATTQEEYQKHIEKDA 359 
D +I+FIDELHTI+G+G G+DS DAANILKPALARG + VGATT EYQK+IEKD 
40 Sbjct: 283 ADPDVIIFIDELHTIIGAGGGMDSvNDAANILKPALARGDFQMVGATTYHEYQKYIEKDE 342 

Query: 360 ALSRRFAKVLVEEPNLEDAYEILLGLKPAYEAFHNVTISDEAVMTAVKVAHRYLTSKNLP 419 

AL RR A++ V+EP+ ++A IL GL+ +E +H V +D+A+ +AV ++ RY+TS+ LP 
Sbjct: 343 ALERRLARINVDEPSPDEAIAILQGLREKFEDYHQVKFTDQAIKSAVTLSVRYMTSRKLP 402 

45 

Query: 420 DSAIDLLDEASATVQMMI KKNAPSLLT ETOQAI LDDDMKSA 460 

D AIDLLDEA+A V++++K ++ E+ +A++ D+K++ 

Sbjct: 403 DKAIDLLDEAAARVKILLKTKKQNVFELEKDFVKAQEELAEAVIKLDVKASRIKEKAVE 462 

50 Query: 461 - - SKALKASYKGKKRKPIAVTEDHIMATLSRLSGI PVEKLTQADSKKYLNLEKELHKRVI 518 

K K S K +KR+ VT+ ++A S L+G+P+ ++T+++S + +NLEKELHKRV+ 
Sbjct: 463 ISDKIYKFSIKEEKRQE--VTDQAVIAVASTLTGVPITQMTKSESDRLINLEKELHKRW 520 

Query: 519 GQDDAVTAISRAIRRNQSGIRTGKRPIGSFMFLGPTGVGKTEIJU^AEVLFDDESALIR 578 
55 GQ++A++A+SRAIRR +SG+ +RP+GSFMFLGPTGVGKTELAKALA+ +F E +IR 

Sbjct: 521 GQEEAISAVSRAIRRARSGVADSRRPMGSFMFLGPTGVGKTELAKALADSVFGSEDNMIR 580 

Query: 579 FDMSEYMEKFAASHIiNGAPPGWGYDEGGELTEKVRNKPYSVLLFDEVEKAHPDIFNVLL 638 
DMSE+MEK + S L GAPPGWGYDEGG+LTE+vRNKPYSV+L DEVEKAH D+FN++L 
60 Sbjct: 581 vDMSEFMEKHSTSRLIGAPPGYVGYDEGGQLTERVRNKPYSWLLDEVEKAHLDVFNIML 640 

Query: 639 QVLDDGVLTDSRGRKVDFSNTIIIMTSNLGATALRDDKTVGFGAKDISHDYTAMQKRIME 698 

Q+LDDG +TD++GRKVDF NTI I IMTSNLGATALRDDKTVGFGAK+ I + DY+AMQ RI+E 
Sbjct: 641 QILDDGFWDTKGRKVDFRNTIIIMTSNLGATALRDDKTVGFGAKNITADYSAMQSRILE 700 
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Query: 699 ELKKAYRPEFINRIDEKWFHSLSQDNMREVVKIMVKPLILALKDKGMDLKFQPSALKHL 758 

ELK+ YRPEF+NRIDE +VFHSL + ++VKIM K LI L ++ + +K PSA+K + 
Sbjct: 701 ELKRHYRPEFLNRIDENIVFHSLESQEIEQIVKIMSKSLIKRLAEQDIHVKLTPSAIKLI 760 

5 

Query: 759 AEDGYDIEMGARPLRRTIQTQVEDHLSELLLANQVKEGQVIKIGVSKGKLK 809 

AE G+D E GARPLR+ +Q +VED LSE LL+ ++K G I IG S K+K 
Sbjct: 761 AEVGFDPEYGARPLRKALQKEVEDLLSEQLLSGEIKAGNHISIGASNKKIK 811 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 4537> which encodes the amino acid 
sequence <SEQ ID 4538>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
15 . INTEGRAL Likelihood = -1.75 Transmembrane 32 - 48 ( 32 - 48) 

Final Results 

bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 285-287 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 618/814 (75%) , Positives = 716/814 (87%) , Gaps = 1/814 (0%) 

Query: 1 MSHYSIKLQEVFRLAQFQAARYESHYLESWHLLLAMVLVHDSVAGLTFAEYESEVAIEEY 60 

M YS K+Q++FR AQFQAAR++SH LE+WH+LLAMV V +S+A + +EY+++VAIEEY 
Sbjct: 1 MI^T!fSTKMQDIFRQAQFQAARroSHCIIETWHvLI4A^IVAvDNSIlANMILSEYDAQVAIEEY 60 

30 

Query: 61 EAATILALGRAPKEEITNYQFLEQSPALKKILKIiAENISIVVGAEDVGTEHVLIAMLVNK 120 

EAA ILA+G+ PKE+++ F QS L +L A+ IS + ++VG+EHVL A+L+N 
Sbjct: 61 EAAAILAMGKTPKEQLSRVDFRPQSKTLTNLLAFAQAISQITRDQEVGSEHVLFAILLNP 120 

35 Query: 121 DLLATRILELVGFRGQDDGESV-RMVDLRKALERHAGFTKDDIKAIYELRNPKKAKSGAS 179 

D++A+R+LE+ G++ +D+G R+ DLRKA+ERHAG++K+ IKAI+ELR PKK K+ + 
Sbjct: 121 DIMASRLLEIAGYQIKDNGNGQPRLADLRKAIERHAGYSKEMIKAIHELRKPKKTKTQGT 180 

Query: 180 FSDMMKPPSTAGDLADFTRDLSQMAVDGEIEPVIGRDKEISRMVQVLSRKTKNNPVLVGD 239 
40 FSDMMKPPSTAG+L+DFTRDL++MA G +E VIGRD+E+SRM+QVLSRKTKNNPVLVGD 

Sbjct: 181 FSDMMKPPSTAGELSDFTRDLTEMARQGLLESVIGRDQEVSRMIQVLSRKTKNNPVLVGD 240 

Query: 240 AGVGKTALAYGLAQRIANGNI PYELRDMRVLELDMMSWAGTRFRGDFEERMNQI IADIE 299 
AGVGKTALAYGLAQRIANG I PYEL++MRVLELDMMSWAGTRFRGDFEERMNQI I DIE 
45 Sbjct: 241 AGVGKTALAYGLAQRIANGAI PYELKEMRVLELDMMSWAGTRFRGDFEERMNQI IDDIE 300 

Query: 300 EDGHIILFIDELHTIMGSGSGIDSTLDAANILKPALARGTLRTVGATTQEEYQKHIEKDA 359 

DG IILF+DELHTIMGSGSGIDSTLDAANILKPAL+RGTL VGATTQEEYQKHIEKDA 
Sbjct: 301 ADGQIILFVDELHTIMGSGSGIDSTLDAANILKPALSRGTLHMVGATTQEEYQKHIEKDA 360 



50 



Query: 360 ALSRRFAKVLVEEPNLEDAYEILLGLKPAYEAFHNVTISDFAWTAVKVAHRYLTSKNLP 419 

ALSRRFAK+L+EEPN EDAY+ IL+GLK +YE +HNV+IS+EAV TAVK+AHRYLTSKNLP 
Sbjct: 361 ALSRRFAKILIEEPNTEDAYQILMGLKLSYETYHWSISNEAVKTAVKMAHRYLTSKNLP 420 



55 Query: 420 DSAIDLLDEASATVQMMIKKNAPSLLTEVDQAILDDDMKSASKALKASYKGKKRKPIAVT 479 

DSAIDLLDEASA VQ M+KK+AP LT +DQA+++ DMK S+ L KG+ RKP VT 
Sbjct: 421 DSAIDLLDEASAAVQNMVKKSAPETLTPIDQALINGDMKKVSRLLAKEAKGQMRKPTPVT 480 

Query: 480 EDHIMATLSRLSGIPvEKLTQADSK^LNLEKELHKRVIGQDDAVTAISRAIRRNQSGIR 539 
60 ED I+ATLS+LSGIP+EKLTQADSKKYLNLEKELHKRVIGQD AVTAISRAIRRNQSGIR 

Sbjct: 481 EDDILATLSKLSGIPLEKLTQADSKKYLNLEKELHKRVIGQDAAVTAISRAIRRNQSGIR 540 

Query: 540 TGKRPIGSFMFLGPTGVGKTELAKALAEVLFDDESALIRFDMSEYMEKFAASHLNGAPPG 599 
TGKRP IGS FMFLGPTGVGKTELAKALAEVLFDDE+ALIRFDMSEYMEKFAAS LNGAPPG 
65 Sbjct: 541 TGKRPIGSFMFLGPTGVGKTELAKALAEVLFDDEAALIRFDMSEYMEKFAASRLNGAPPG 600 
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Query: 600 YVGYDEGGELTEK^/iRNKPySVLIjFDEVEKAHPDIFNVLLQVLDDGVLTDSRGRKVDFSNT 659 

YVGYDEGGELT+KVRNKPYSVLLFDEVEKftHPDIFNVLLQVLDDG+LTDSRGRKVDFSNT 
Sbjct: 501 WGYDEGGELTQKVRNKPYSVLLFDEWKAHPDIEWLLQVLDDGILTDSRGRKVDFSNT 660 

5 

Query: 660 1 1 IMTSNLGATALRDDKTVGFGAKDI SHDYTAMQKRIMEELKKAYRPEFINRIDEKVVFH 719 

IIIMTSNLGATALRDDKTVGFG KDI D+ AM+KRI +EEL+K YRPEFINRIDEKWFH 
Sbjct: 661 IIIMTSNLGATALRDDKTVGFGVKDIHQDHQAMEKRILEELRKTYRPEFINRIDEKWFH 720 

10 Query: 720 SLSQDNMREWKIMVKPLILALKDKGMDLKFQPSRBKHLAEDGYDIEMGARPLRRTIQTQ 779 

SL+QDNMR+WKIMV+PLI L 4-KG+ LK QP ALKHL+E GYD MGARPLRRT+QT+ 
Sbjct: 721 SLTQDNMRDWKIMVQPLITTLAEKGITLKIQPLALKHLSEVGYDEHMGARPLRRTLQTE 780 

Query: 780 VEDHLSELLLANQVKEGQVIKIGVSKGKLKFDIA 813 
15 +ED LSEL+L+ ++ G +K1G+S GKL F IA 

Sbjct: 781 IEDKLSELILSRELTSGHTLKIGLSHGKLTFHIA 814 

A related GBS gene <SEQ ID 8819> and protein <SEQ ID 8820> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crend: 9 

McG: Discrim Score: -13.52 
GvH: Signal Score (-7.5): -2.1 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 
25 ALOM program count: 1 value: -2.34 threshold: 0.0 

INTEGRAL Likelihood = -2.34 Transmembrane 32 - 48 { 32 - 49) 
PERIPHERAL Likelihood = 0.95 112 
modified ALOM score: 0.97 

30 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

47.4/69.6% over 804aa 

Listeria monocytogenes 

40 EGAD|l3676l| ClpC ATPase Insert characterized 

GP|l314297|gb|AAC44446.l| |U40604 ClpC ATPase Insert characterized 

ORF00207(298 - 2727 of 3045) 

EGAD) 136761 | 145854 (2 - 806 of 825) ClpC ATPase {Listeria monocytogenes} 
45 GP|l314297|gb|A&C44446.l| |U40604 ClpC ATPase {Listeria monocytogenes} 

%Match =33.6 

%Identity =47.4 %Similarity =69.6 

Matches = 372 Mismatches = 229 Conservative Sub.s = 174 

50 87 117 147 177 207 237 267 297 

SFF*STPIIWKYVINDWRAYQ*TSF**FDSIIIR*RDNYRT*RKFDSGDIR**RLRRASLCY*SSYAP*IITTIR*KRIP 



55 327 357 387 417 447 477 507 537 

FMSHYSIKLQEVFRLAQFQAARYESHYLESWHLLLAMVLVHDSVAGLT 



60 



MFGRFTQRAQKOTLALSQEEAMRLNHSNLGTEHILLGLVREGEGIAA--KALYELGISSEK^QQEVEGLIGHG-EKAVTTI 
20 30 40 50 60 70 



567 597 627 657 687 717 744 774 

QFLEQSPALKKILKIAENISIWGAEDVGTEHvLIjAMLVNKDLLATRIIjXLVGFRGQDDGESV-RMVDLRKALERH^ 

|: I l|:::h = = =1 llllhll :: = =1 hi =1 : | ::: . | 

QYT- - -PRAKKVIELSMDEARKLGHTYVGTEHILLGLIREGEGVAARVLSNLGISLNKARQQVLQLLGGGDA 

65 90 100 110 120 130 140 
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10 



804 834 864 894 924 954 984 1014 

KDDIKAIYELRNPKKAKSGASFSDMMKPPSTAGDLADFTRDLSQMAVD^ 

:|| = I II |||: =1 = = = 11111 III I = = = I I I I = I I I I I I I = I 
-TGAGRQTNTQATPTLDSIA- - -RDLTVIAREDMLDPVIGRSKEIQRVIEVLSRRTKNNPVLIG 

150 160 170 180 190 200 

1044 1074 1104 1134 1164 1194 1224 1254 

DAGVGKTALAYGLAQRIANGNI PYELRDMRVLELDMMSWAGTRFRGDFEERMNQI IADIEEDGHI ILFIDELHTIMGSG 

= 111111=1 1111=1 =1 II 11= III :||lll==ll=ll=l= === =1 = l==lllllllll==l=l 

EPGVGKTAIAEGIAQQIVRNEVPETLRGKRVMTLDMGTWAGTKYRGEFEDRLKKVMDEIRQAGNVILFIDELHTLIGAG 

220 230 240 250 260 270 280 



1284 1314 1344 1374 1404 1434 1464 1494 

1 5 SGIDSTLDAANILKPAIARGTLRTVGATTQEEYQKHIEKDAALSRRFAKVL^ 

i = =11 = 11111 mi i= =iiii =11 = 1 = 1111 ii iii = 1 = 11 =i = = =n n= in iii = 

-GAEGAIDASNILKPPLARGELQCIGATTLDEYRKYIEKDRALERRFQPIKVDEPTVEESIQILHGLRDRYEAHHRVAIT 
300 310 320 330 340 350 360 

20 1524 1554 1584 1614 1644 1674 1704 

DEAVMTAVKVAHRYLTSKNLPDSAIDLLDEASATVQM MI KKNAPSLLTEVDQAI LDDDMKSASKALKASY 

|||: ||::: ||:= : ||| |||::||: : |:: :: | | | | |: . = : |: 

DEALFAAVRLSDRYISDRFLPDKAIDVIDESGSKVRLKSFTTPKWKEMENNLSDLKKEKDAAVQGQEFEKAASLRDKEQ 
380 390 400 410 420 430 440 

25 

1725 1737 1767 1797 1827 1857 1887 

KGKK RKPIA VTEDHIMATLSRLSGIPVEKLTQADSKKYLNLEKELHKRVIGQDDAVTAISR 

iii =i = mi = == =ini n = == i n = n iniiini n ni 

KLKKSLDKKSLEETKANWQEKQGLDHSEVTEDIVAEWASWTC 
30 460 470 480 490 500 510 520 

1917 1947 1977 2007 2037 2067 2097 2127 

AIRRNQSGIRTGKRPIGSFMFLGPTGVGKTELAKALAEVLFDDESALIRFDMSEYMEKFAASHLNGAPPGYVGYDEGGEL 

nn = = i = = nniiniininninnnii =i n = = n iniiinn = i innninnni 

3 5 AVRRARAGLKDPKRPIGSFIFLGPTGTOKTELARALftESMFGDEDSMIRIDMSEYMEK^^ 

540 550 560 570 580 590 600 

2157 2187 2217 2247 2277 2307 2337 2367 

TEKVRNKPYSVLLFDEVEKAHPDIFNVLLQVLDDGVLTDSRGRKVDFSOTIIIMTS^GATALRDDKTVGFGAKDISHDY 

40 inn nnnninn n mini 1 1 inn innn in inn 1111 = 11 == n = = n 1 n 

TEKVRQKPYSVVLLDEIEKAHPDVFNMLLQVLDDGRLTDSKGRWDFRNTVIIMTSNIGAQEMKQDICSMGFNVTDPLKDH 
620 630 640 650 660 670 680 

2397 2427 2457 2487 2517 2547 2577 2607 

45 TAMQKRIMEELKKAYRPEFINRIDEKWFHSLSQDM^EWO 

n= i====n:i = niiinin =11111 •• ••••••••1 1 1 = =i= 1 ••1 = 1111 1 11 

KAMEHRVLQDLKQAFRPEFINRIDETIVFHSLQEKELKQIVTLLTAQLTKRLAERDIHVKLTEGAKSKIAKDGYDPEYGA 
700 710 720 730 740 750 760 

50 2637 2667 2697 2727 2757 2787 2817 2847 

RPLRRTIQTQVEDHLSELLLANQVKEGQVIKIGVSKGKLKFDIAKS*NIPVPMGTGILI*KENVQNILD1FL*IYEK*KD 

nm ii =ni iii ii =i i = = in nn 

RPLKRAIQKEVEDMLSEELLRGNICTGDYVEIGVKDGKLETOKKDAPKKKTTSKKVKAK 
780 790 800 810 820' 

55 

There is also homology to SEQ ID 258. 

SEQ ID 8820 (GBS26) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 9; MW 93.3kDa), in Figure 167 (lane 16 & 17; MW 108kDa) and in 
Figure 239 (lane 14; MW 108kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
60 analysis of total cell extract is shown in Figure 15 (lane 7; MW 1 18kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1477 

A DNA sequence (GBSxl563) was identified in S.agalactiae <SEQ ID 4539> which encodes the amino 
acid sequence <SEQ ID 4540>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
5 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 454 1> which encodes the amino acid 
sequence <SEQ ID 4542>. Analysis of this protein sequence reveals the following: 

Possible site: 17 



15 



25 



>» Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

20 bacterial membrane — Certainty=0 . 0000 (Not Clear) .< suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 178/213 (83%) , Positives = 199/213 (92%) 

Query: 1 MLIvIiAGTIGB^KSSLAAALGQHLGTDW^ 60 

MLIVIiAGTIGAGKSSIAAAIfi+HIjGTDVFYFAVD^ 
Sbjct: 1 MLIVIAGTIGAGKSSLAAAIXSEHLGTDVFYEAVDNNPvIiDLYYQDPKKYAFLLQIYFLNK 60 

30 Query: 61 RFQSI KEAYKfiNNNVLDRS I FEDELFLTIiNYKNGNvTKTELD I YKELLANMLEELEGMPK 120 

RF+ S I KEAY+A+NN+LDRS I FEDELFL LNYKNGNVTKTELDIY+ELLANMLEELEGMPK 
Sbjct: 61 RFKSIKEAYQADNNILDRSIFEDELFLKLNYKNGNVTKTELDIYQELLANMLEELEGMPK 120 

;, Query: 121 KRPDLLVYIDVSFDKMLERIDKRGRSFEQVDSNPELYDYYKQVHSEYPEWYENYDVSPKI 180 
35 KRPDLL+YIDVSFDKMLERI++RGRSFEQVD BP L YT QVH EYP WYE+Y+VSPK+ 

Sbjct: 121 KRPDLLIYIDVSFDKMLERIERRGRSFEQVDGNPSLEQYYHQVHGEYPTWYEDYEVSPKM 180 

Query: 181 RIDGNKLDFVKNPEDLQHVLDTIDSELQKLDLL 213 
+IDGN LDFV+NP+DL VL ID++L++L LL 
40 Sbjct: 181 KIDGNSLDFVQNPQDLATVLKMIDTKLKELHLL 213 

A related GBS gene <SEQ ID 8821> and protein <SEQ ID 8822> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
45 McG: Discrim Score: 3.94 

GvH: Signal Score (-7.5): 1.42 

Possible site: 17 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 7.69 threshold: 0.0 
50 PERIPHERAL Likelihood =7.69 49 

modified ALOM score: -2.04 

*** Reasoning Step: 3 

55 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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SEQ ID 4540 (GBS9) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 5; MW 52kDa) and Figure 12 (lane 2 & 3; MW 50.3kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 2 
(lane 6; MW 27kDa) and Figure 3 (lane 2; MW 25kDa). The GBS9-GST fusion product was purified 
5 (Figure 191, lane 6) and used to immunise mice. The resulting antiserum was used for FACS (Figure 318), 
which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1478 

10 A DNA sequence (GBSxl564) was identified in S.agalactiae <SEQ ID 4543> which encodes the amino 
acid sequence <SEQ ID 4544>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 1182 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 4545> which encodes the amino acid 
sequence <SEQ ID 4546>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 281/323 (86%) , Positives = 305/323 (93%) 





Query: 


3 


QtNSSFMIGKVEIPHRTVIAPmGITNSAFRTIAKEFGAGLVVMEMISEKGLLYNNEKTL 


62 








+LNSSF IG VE I PHRTVLAPMAG+TNSAFRT I AKE FGAGLWMEMI SEKGLLYNNEKTL 




35 


Sb j ct : 


27 


KI^SSFRIGDVEIPHRTVXAP^GVTNSAFRTIAKEFGAGLVVMEMISEKGLLYNNEKTL 


86 




Query: 


63 


HMLHIDENEHPMSIQLFGGDAEGLKRAADFIQSNTKADITOINMGCPTOKWKNEAGAKW 


122 








HMLHIDENEHPMSIQLFGGDAEGLKRAADFIQ+NTKADIVniNMGCPWKVVKNEAGAKW 




40 


Sbjct: 


87 


HMLHIDENEHPMSIQLFGGDAEC3LKRAADFIQTOTKADIVDINMGCPWKOTKNEAGAK^ 


146 




Query: 


123 


LRDPEKIYHIVKEVTSVLDIPLTVKMRTGWSDSSNAIENALAAESAGVSAIAMHGRTREQ 


182 








LRDP+KI YHIVKEVTSVLDI PLTVKMRTGW+DSS A+ENALAAESAGVSALAMHGRTREQ 






Sbjct: 


147 


LRDPDKIYHIVTOVTSVTiDIPLTVKMRTGWADSSlAvENAIAAESAGVSAIAMHGRTREQ 


206 


45 


Query: 


183 


^TGTCDHETLGKVAKAOTSIPFIANGDIRTVHDAKFMIEEIGADAIMVGRGARSNPYIF 


242 








MYTGTCDHETL +V+KA+T IPFI NGD+R+V DAKFMIEEIG DA+M+GR A +NPY+F 






Sb j ct : 


207 


MYTGTCDHETLARVSKAITKIPFIGNGDVRSVQDAKFMIEEIGVDAVMIGRAAMNNPYLF 


266 




Query: 


243 


TQINHFFETGEILPDLPFEKMLDVAEDHLTRLvNLKGETIAVREFRGLAPHYLRGKSGAA 


302 


50 






TQINHFFETG+ LPDLPF K LD+A+DHL RL+NLEGETIAVREFRGLAPHYLRG +GAA 






Sbjct: 


267 


TQINHFFETGQELPDLPFAKKLDIAKDHLKRLINLKGETIAVREFRGIjAPHYLRGTAGAA 


326 




Query: 


303 


KIRGAVSRAETLAEVQELFAGLR 325 










K+RGAVSRAETLAEV+ +F +R 




55 


Sbjct: 


327 


KVRGAVSRAETLAEVEAIFETVR 349 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1479 

A DNA sequence (GBSxl565) was identified in S.agalactiae <SEQ ID 4547> which encodes the amino 
5 acid sequence <SEQ ID 4548>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm --- Certainty=0. 2164 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 3930: 

15 Identities = 235/288 (81%), Positives = 259/288 (89%) 

Query: 1 MDKIIKSISTSGSFRAYVLDCTETVRTAQEKHQTLSSSTVALGRTLIANQILAANQKGNS 60 

MDKIIKSI+ SG+FRAYVLD TETV AQEKH TLSSSTVALGRTLIANQILAANQKG+S 
Sbjct- 1 MDKIIKSIAQSGAFRAYVLDSTETVAIAQEKHNTLSSSTVALGRTLIANQILARNQKGDS 60 

20 

Query: 61 KVIVKVIGDSSFGHIISVADTKGNVKGYIQNrGvDIKKTATGEVLVGPFMGNGHFVVITD 120 

K+TVKVIGDSSFGHIISVADTKG+VKGYIQNTGVDIKKTATGEVLVGPFMGNGHFV I D 
Sbjct: 61 KITVTWIGDSSFGHIISVADTKGHWGYIQNTGvDlKKTATGEVLVGPFMGNGHFVTIID 120 

25 Query: 121 YATGQPYTSTTPLITGEIGEDFAYYLTESEQTPSAVGLNVLLDDEDKVKVAQGFMLQVLP 180 

Y TG PYTSTTPLITGEIGEDFAYYLTESEQTPSA+GLNVLLD+ DKVKVAGGFM+QVLP 
Sbjct: 121 YGTGNPYTSTTPLITGEIGEDFAYYLTESEQTPSAlGIiNVLLDENDKVKVAGGFMVQvLP 180 

Query: 181 GASDEEISRYEKRIQEMPSISSLLESENHIESLLSAIYGEDDYKRLSEDSLAFYCDCSKE 240 
30 GAS+EEI +KYEKR+QEMP+ 1 S LL S+NH+++LL AIYG++ YKRLSE+ L+F CDCS+E 

Sbjct: 181 GASEEEIARYEKRLQEMPAISHLLASKNHVDALLEAIYGDEPYKRLSEEPLSFQCDCSRE 240 

Query: 241 RFEAALLTLGTKELQAMKDEDKGVEITCQFCNQTYYFTEEDLEKIIND 288 
RFEAAL+TL +LQAM DEDKG EI CQFC Y F E DLE II+D 
35 Sbjct: 241 RFEAALMTLPKADLQAMIDEDKGAEIVCQFCGTKYQFNESDLEAIISD 288 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1480 

40 A DNA sequence (GBSxl566) was identified in S.agalactiae <SEQ ID 4549> which encodes the amino 
acid sequence <SEQ ID 4550>. This protein is predicted to be surface-located membrane protein 1 (lmpl). 
Analysis of this protein sequence reveals the following: 



45 



50 



possible site: 51 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 .4312 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB93480 GB:AF019377 tellurite resistance protein [Rhodobacter 
sphaeroides] 

Identities = 64/350 (18%) , Positives = 146/350 (41%) , Gaps = 7/350 (2%) 



55 
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Query: 44 LTPAQKSAISEKTPALVDTFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDL 103 

LA E + + V D +++ FG A + T +L++ K + D 

Sbjct: 34 LASAPPEKAQEIRRRMAELNVSDSQSIIGFGSKAQAELQTISQQMLADVKNKDVGPAGDS 93 

5 Query: 104 IiKNANRELNGFIAKYKDATPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAA 163 
L+ + GF + ++ +K + ++L ++ F ++++Q++D + 

Sbjct: 94 LREWSTIRGF SVSEFDVRRKASWWERLLGRT-APFARFVARYEDVQQQIDRITQ 147 

Query: 164 NWKQEDTLARNI VSAEMLI EDNTKS I ENLVGVIAFIESSQAEAANRASHLQQE I IALDS 223 
10 +++ E L ++I ++L + L IA + A+ R ++ +A 

Sbjct: 148 SLLTHEHRLLKD I KGLD I LYARTLDFYDELAL Y I AAGDEVLADLDGRVI PAKEAEVAATP 207 

Query: 224 QTSEYQIKSNQIaARMTEVINTLEQQHPEWSRLYVAWATTPQMRNLVKVSSDMRQKLGML 283 
+ + IK+ +L + + LE++ + V +P+R + + +++ 

15 Sbjct: 208 E -GDRMI KAQELRDLRAARDDLERRVHDLKLTRQVTMQSLPS IRLVQENDKALVTRINST 266 

Query: 284 RRNTI PTMKLS IAQLGMMQQSVKSGVTADAI VNANNAALQMIAETSKEAI PMLEKTAQSP 343 

NT+P + +AQ +Q+S ++ + N L AE ++A ++ K + 

Sbjct: 267 LVMTVPLWETQLAQAVTIQRSREAAEAVRGASDLTNELLTANAENLQQANKIVRKEMERG 326 

20 

Query: 344 TVSIKSVTALAESLVAQNNGIIAAIDKGRKERAQLESAVIKSAETINDSV 393 

1++V +L+A N +A D+GR RA E+ + + + D++ 

Sbjct: 327 VFDIEAVKKANATLIATINESLAIADEGRARRATAETELQRMEAELRDTL 376 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 455 1> which encodes the amino acid 
sequence <SEQ ID 4552>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 3230 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 333/413 (80%) , Positives = 379/413 (91%) 

Query: 5 FNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFV 64 
FNFDIDQIADNA+ KTDKTT+IIS+ T GQI+FFEKL+ Q++AI+ K PALVDTF+ 
40 Sbjct: 4 FNFDIDQIADNAVIKrDKTTDIISDLPTDTNGQISFFEKLSADQQTAITAKAPALVDTFL 63 

Query: 65 GDQNALLDFGQSATOGvNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPA 124 

DQNALLDFGQSAVEGVN TVNHI L+EQKK+QI PQVDDLLK+ NRELNGFIAKYKDATP 
Sbjct: 64 ADQNALLDFGQSAvEGTOATVNHIIAEQKKLQIPQVTJDLLKSTNRELNGFIAKYKDATPV 123 

45 

Query: 125 ELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMD^IMAANWKQEDTLARNIVSAEMLIE 184 

+L+KKPN +QKLFKQS+ +LQEFYFDSQNIEQKMD MAA WKQEDTLARNIVSAE+LIE 
Sbjct: 124 DLDKKPNFLQKLFKQSRDTLQEFYFDSQNIEQKMDSMAAAWKQEDTLARNIVSAELLIE 183 

50 Query: 185 DNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINT 244 

DNTKSIE+LVGVIAFIE+SQ EA+ RA+ LQ+++ DS T +YQIK++ LAR TEVINT 
Sbjct: 184 DNTKSIEHLVGVIAFIFASQKEASQRAAM^KDLKTKDSATPDYQIKADLIARTTEVINT 243 

Query: 245 LEQQHPEWSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQS 304 
55 LEQQH EY+SRLWAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQS 

Sbjct: 244 LEQQHTEYLSRLYVAWATTPQMRNLVOTSSD^QKLGMLRRNTIPTMKLSIAQLGMMQQS 303 

Query: 305 VKSGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTOSIKSVTALAESLVAQN^ 364 
VKSG+TADAI +NANNAALQMLAETSKEAI P LE++AQ+PT+S+KSVT+LAESLVAQNNGI 
60 Sbjct: 304 VKSGMTADAIINANNAALQMLAETSKEAIPALEQSAQNPTLSMKSVTSLAESLVAQNNGI 363 

Query: 365 IAAIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDE 417 

IAAID GRKERAQLESA+I+SAETINDSVK+RD+ IV+ALL+EGK TQ+ +D+ 
Sbjct: 364 IAAIDHGRKERAQLESAIIRSAETINDSVKLRDQNIVQAIiLSEGKETQKTIDK 416 
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SEQ ID 4550 (GBS201) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 5; MW 49kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 3; MW 74.5kDa) and in Figure 
62 (lane 8 & 9; MW 74.5kDa). The GBS201-GST fusion product was purified (Figure 209, lane 9) and 
5 used to immunise mice. The resulting antiserum was used for FACS (Figure 304), which confirmed that the 
protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1481 

10 A DNA sequence (GBSxl567) was identified in S.agalactiae <SEQ ID 4553> which encodes the amino 
acid sequence <SEQ ID 4554>. This protein is predicted to be rhoptry protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 27 

>» Seems to have an uncleavable N-term signal seq 
15 INTEGRAL Likelihood = -6.58 Transmembrane 13 - 29 ( 10 - 31) 

INTEGRAL Likelihood = -1.54 Transmembrane 33 - 49 ( 33 - 49) 

Final Results 

bacterial membrane Certainty=0. 3633 (Affirmative) < suco 

20 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4555> which encodes the amino acid 
25 sequence <SEQ ID 4556>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have a cleavable N-term signal seg. 

30 Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 115/239 (48%) , Positives = 162/239 (67%) , Gaps = 3/239 (1%) 

Query: 32 EVIATLLIIGGGYCAYYVYD-KKRLKRFTSNQRIEALKSDIKETDQDIRHLEILKKDNRS 90 
40 +++ + I G GY + V +KRL + +++E LK+ 1+ D+ +R L+ D+ 

Sbjct: 42 DILPAIAIGGTGYAIFRWSHQKRLAKaKIAKQLEDLKAKIQLADRKVRLLDTYLADHDD 101 

Query: 91 KEYIKLAHQILPQLDLIRNEANQLQKAIEPNIYKRITKKANTFSNEINEQLIKLHASPEL 150 
+Y LA Q+LPQL 1+ +A L+ ++P IY+RITKKAN ++I QL L + L 
45 Sbjct: 102 FQYNVLAQQLLPQLSDIKAKAITLKDQLDPQIYRRITKKANDVESDITLQLETLQIATTL 161 

Query: 151 --EPISDQEDEMIRIAPELKPFYHNIQDDHFAILKKIEEADNKAELAAIHQANMKRFTDV 208 

+P+ +1 APELKP+Y NIQ DH AIL KI+ ADN+ EL A+H ANM+RF D+ 

Sbjct: 162 NPQPLKTPSPNLINKAPELKPYYDNIQTDHQAILAKIQGADNQEELLALHDANMRRFEDI 221 



50 



Query: 209 LAGYIRIKQSPKNFNNAKERLEQALQAIKKFNLDIJDETLRQLNESDMKDFDVSLR^QG 267 

L GY++IK+ PKN+ NA RLEQA QAI++F+ DLDETLR+LNESD+KDFD+SLR+MQG 
Sbjct: 222 LTGYLKIKEEPKNYYNAAARLEQAKQAIQQFDEDLDETLRRLNESDLKDFDISLRIMQG 280 
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SEQ ID 4554 (GBS265) was expressed in E.coli as GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 54 (lane 2; MW 56kDa) and in Figure 62 (lane 6; MW 56.3kDa). 

The GBS265-GST fusion product was purified (Figure 207, lane 5) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 258A) and FACS (Figure 25 8B). These tests confirm 
that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1482 

A DNA sequence (GBSxl568) was identified in S.agalactiae <SEQ ID 4557> which encodes the amino 
acid sequence <SEQ ID 4558>. This protein is predicted to be glutamate-cysteine ligase (gshA). Analysis 
of this protein sequence reveals the following: 

Possible site: 40 

■»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 575 - 591 ( 575 - 591) 

Final Results 

bacterial membrane Certainty=0 . 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG08588 GB:AE004933 glutamate — cysteine ligase [Pseudomonas aeruginosa] 
Identities = 142/468 (30%) , Positives = 220/468 (46%) , Gaps = 62/468 (13%) 

Query: 12 SHLPIL-QATFGDERESLRIHQPTQRVAQTPHPKTLGSRNYHPYIQTDYSEPQLELITPI 70 

++LP+L + G+ERE LR+ ++A TPHP+ LGS HP I TDYSE LE ITP 

Sbjct: 16 ANLPLLTECLHGIERECLRVDSDG-KIiALTPHPRALGSTLTHPQITTDYSEALLEFITPT 74 

Query: 71 AKDSQEAIRFLKAISDVAGRSINHDEYLWPLSMPPKV-REEDIQIAQLEDA FEYDY 125 

D + + L+ I A ++ EYLW SMP ++ EE I IA+ + +Y Y 

Sbjct: 75 ETDVADTLGDLERIHRFASSKLD-GEYLWSPSMPCELPDEESIPIARYGSSMIGRLKYVY 133 

Query: 126 RKYLEKTYGKLIQSISGIHYNLGLGQELLTSLFELSQAD-NAIDFQNQLYMKLSQNFLRY 184 

RK L YGK +Q I+GIHYN L + L L + ++ + D+Q+ Y+ L +NF RY 
Sbjct: 134 RKGLALRYGKTMQCIAGIHYNFSLPERLWPLLRQAEGSELSERDYQSAAYIALIRNFRRY 193 

Query. 185 RWLLTYLYGASPVAEEDFLDQKLNNPVR SLRNSHLGYVNHKDIRIS-- 230 

WLL YL+GASP + FL + + R SLR S LGY N+ ++ 

Sbjct: 194 SWLLMYLFGASPALDAGFLRGRPSQLERLDEHTLYLPYATSLRMSDLGYQNNAQAGLTPC 253 

Query: 231 YTSLKDYVNDLENAV KSGQLIAEKEFYSPVRLR G 264 

Y L+ Y++ L AV + L E E+YS +R + G 

Sbjct: 254 YNDLQSYIDSLRQAVSTPYPPYEKVGTKQDGEVIVQLNTNILQIENEYYSSIRPKRVTYTG 313 

Query: 265 SKACRNYLEKGITYLEFRTFDLNPFSPIGITQETVOTVHLFLLALLWIDS 314 

+ + +G+ Y+E R D+NPF P+GI + + FLL + DS 
Sbjct: 314 ERPVQALAARGVQYVEVRCLDINPFLPLGIDLDEARFLDAFLLFCAFSDSPLLNGECSDA 373 

Query: 315 SSHIDQDIKEANRLN-DLIALSHPLEKLPNQAPVSDLVDAMQSVIQHFNLSPYYQDLLES 373 

+ + +KE R L P+E + + + +++ + L + 

Sbjct: 374 TDNFIAWKEGRRPGLQLQRRGQPVELQVWANELLERIADTAALLDRARGGEAHAAALAA 433 

Query: 374 VKRQIQSPELTVAGQLLEMI - -EGLSLETFGQRQGQIYHDYAWEAPYA 419 

+ ++ ELT + Q+L+++ GSEF RQ + + +Y + P A 
Sbjct: 434 QRAKVADAELTPSAQVLKVMRERGESFEAFSLRQSREHAEYFRQHPLA 481 
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There is also homology to SEQ ID 4560. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1483 

A DNA sequence (GBSxl569) was identified in S.agalactiae <SEQ ID 4561> which encodes the amino 
acid sequence <SEQ ID 4562>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1504 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73814 GB:AL139078 helix-turn-helix containing protein 
[Campylobacter jejuni] 
Identities = 107/223 (47%) , Positives = 148/223 (65%) , Gaps = 7/223 (3%) 

MDKEKLDYWKTIITFLHNVLGDNYEIVLHWDENDIYIGELVNSHISGRTISSPLTTFAL 60 
MD+ + + + FL VLG+ YEIV HV+ E+ YI + NSHISGR++ SPLT FA 
MDEGQKQQFIKLTYFLGEVLGEQYEIVFHVITEDGAYIAAIANSHISGRSLDSPLTAFAS 60 

DLIKNKVYKEKDFVTNYKAIVSPLNKEVRGSTFFIKNAQNELEGMLCINLDISAYQNIAL 120 
+L++NK Y EKDF+ +YKA+V +K +RGSTFFIKN ++L G+LCIN D S +++ 



++DL + ++ IL IS Q + +E LS +I+DI+ + VD S LN + LS 



K EI KL+EKG+F +KGAV VA+ L ISEPSVYRYLKK + 
2KEEIAEKLYEKGIFNIKGAVPIVAKFLKISEPSVYRYLKKFK 217 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4563> which encodes the amino acid 
sequence <SEQ ID 4564>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1636 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/224 (75%) , Positives = 198/224 (87%) , Gaps = 3/224 (1%) 

Query: 1 MDKEKLDYWKTIITFLHNVLGDNYEIVLHvVDENDIYIGELVNSHISGRTISSPLTTFAL 60 

MDKE L+ YWKT+ ITFLH+VLGDNYE I +LHV+D+NDI YIGELVNSHI SGR+ SPLTTFAL 
Sbjct: 1 MDKETLNYWKTVITFLHDVLGDNYEIILHVIDKNDIYIGELVNSHISGRSKQSPLTTFAL 60 

Query: 61 DLIKNKVYKEKDFVTNYKAIVSPLNKEVRGSTFFIKNAQNELEGMLCINLDISAYQNIAL 120 

DLI NKVYKEKDFVTNYKAIVSP +KEVRGSTFFIK+ + LEGMLCINLDISAYQ +A 
Sbjct: 61 DLITNKVYKEKDFVTNYKAIVSPQHKEWGSTFFIKDKKGNLEGMLCINLDISAYQGvAR 120 

Query: 121 DILDL VNLNVNKILP - - KSPQKISLPQQEEPVEVLSGNIQDI I SEIVDPSLLNQNIHLSQ 178 
D+L LVNI1N+ +P K P+ ++ PQ EE VE+L+ NIQDII +I+DPSLL N+HLSQ 



Query : 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j Ct : 


119 


Query: 


180 


Sb j ct : 


175 
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Sbjct: 121 DLLK1 VNLNLEHFIPTAKEPKTVT- PQPEEAVTLILTSNIQDI IGQI IDPSLLRHNVHLSQ 179 

Query: 179 EVKVEIVSKLHEKGVFQLKGAVSKVAEVIiNISEPSVYRYLKKIE 222 

+VK++IV+KL+EKGVFQLKGAVSKVA++L ISEPSVYRYLKKIE 
Sbjct: 180 DVKIDIVAKLYEKGVFQLKGAVSKVADILCISEPSVYRYLKKIE 223 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1484 

A DNA sequence (GBSxl570) was identified in S.agalactiae <SEQ ID 4565> which encodes the amino 
acid sequence <SEQ ID 4566>. This protein is predicted to be regulatory protein pfoR. Analysis of this 
protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have a cleavable N-term signal seq. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA60239 GB:X86525 pfoS [Clostridium perfringens] 
Identities = 96/147 (65%) , Positives = 122/147 (82%) 

Query: 100 GTGI I PGFLAGYLVGFLVKWMERNI PGGLDLISI 1 1 IGAPLTRL VAKLLTPLINSTLLTI 159 

G GI+PGF+AGYL F++K++E+ IP GLDLI II ++GAPL R +A + PL+ +TL I 
Sbjct: 1 GFGILPGFIAGYLGSFVIKFLEKKIPAGLDLIVIIVLGAPLVRGIAAISNPLVETTLQNI 60 

Query: 160 GDILTSGAHSNPILMGIILGGTI VWATAPLSSMALTAMLGLTGMPMAIGALSVFGSSFM 219 

G ++T+ + ++PI+MGIILGG + WATAPLSSMALTAMLGLTG+PMAIGAL+VFGSSFM 
Sbjct: 61 GGVITATSTASPIMMGIILGGIVTWATAPLSSMALTAMLGLTGLPMAIGALAVFGSSFM 120 

Query: 220 NGVLFHKLKLGSRKDNIAFAVEPLTQA 246 

N V F K+K GS+KD IA A+EPLTQA 
Sbjct: 121 NLVFFGKMKFGSKKDTIAVAIEPLTQA 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4567> which encodes the amino acid 
sequence <SEQ ID 4568>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 4482 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 .4121 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAA60239 GB:X86525 pfoS [Clostridium perf ringens] 
Identities = 95/147 (64%) , Positives = 123/147 (83%) 

5 

Query: 100 GTGI I PGFVAGYWSFLI KWMEKNIPGGLDLISI I IVGAPLTRFIAQLITPVINSTLLTI 159 

G GI+PGF+AGY+ SF+IK++EK IP GLDLI II++GAPL R +A + P++ +TL I 
Sbjct: 1 GFGILPGFIAGYLGSFVIKFLEKKIPAGLDLIVIIVLGAPLVRGIAAISNPLVETTLQNI 60 

10 Query: 160 GDILTSSANSNPIIMGMILGGTIVWATAPLSSMALTAMLGLTGIPMAIGALSVFGSSFM 219 

G ++T+++ ++PI+MG+ILGG + WATAPLSSMALTAMLGLTG+PMAIGAL+VFGSSFM 
Sbjct: 61 GGVITATSTASPI^GIILGGIATTWATAPLSSMALTAMLGLTGLPMAIGAIAVFGSSFM 120 

Query: 220 NGVLFYRLKLGERKDNIAFAIEPLTQA 246 
15 N V F ++K G +KD IA AIEPLTQA 

Sbjct: 121 NLVFFGKMKFGSKKDTIAVAIEPLTQA 147 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 302/339 (89%) , Positives = 330/339 (97%) 

20 

Query: 1 MNIIIGTSLLILVIAIFTLFNYKAPYGTKRMGAIASAACASFLVEAFQDSFFGKVLGFQF 60 

M+IIIGTSLLILVLAIF+LFNYKAP+G KAMGALASAACASFLVEAFQDSFFGKVLGFQF 
Sbjct: 1 MDIIIGTSLLILVIiAIFSLFOTKAPHGAKAMGAIASAA.CASFLvEAFQDSFFGKVLGFQF 60 

25 Query: 61 LSEVGGANGSLSGVAAAILVAIAIGVTPGYAVLIGLSVSGTGIIPGFLAGYLVGFLVKWM 120 

LSEVGGANGSLSGVAAAILVAIAIGV+PGYAVLIGLSVSGTGIIPGF+AGY+V FL+KWM 
Sbjct: 61 LSEVGGANGSLSGVAAAILVAIAIGVSPGYAVLIGLSVSGTGIIPGFVAGYWSFLIKWM 120 

Query: 121 ERNIPGGLDLISIIIIGAPLTRIivAKLIiTPLINSTLLTIGDILTSGAHSNPILMGIILGG 180 
30 ■ E+NIPGGLDLISIII+GAPLTR +A+L+TP+INSTIiTIGDILTS A+SNPI+MG+ILGG 

' Sbjct: 121 EKNIPGGLDLISIIIVGAPLTRFIAQLITPVINSTIiLTIGDILTSSANSNPIIMGMILGG 180 

■ Query: 181 TIVWATAPLSSMALTAMLGLTGMPMAIGALSVFGSSFMNGVLFHKLKLGSRKDNIAFAV 240 
TIVWATAPLSSMALTAMLGLTG+PMAIGALSVFGSSFMNGVLF++LKLG RKDNIAFA+ 
35 , Sbjct: 181 TIVWATAPLSSMALTAMLGLTGIPMAIGALSVFGSSFMNGVLFYRLKLGERKDNIAFAI 240 

Query: 241 EPLTQADVTSANPIPIYVTNFVGGAACGILIALMJCLVNDTPGTATPIAGFAVMFAYNPMI 300 

EPLTQADVTSANPIPIYVTNFVGGAACG+LIALMKLVOT)TPGTATPIAGFAVMFAYNP+ 
Sbjct: 241 EPLTQADVTSANPIPIYvTNFVGGAACGVIiIALMKLViroTPGTATPIAGFAvMFAYNPVA 300 

40 

Query: 301 KVLITALGCIILSLLAGYFGGIVFKDYKLVTKEELQARD 339 

KVLITALGCII+SL4- GY GG WK+Y+LVTK+ELQAR+ 
Sbjct: 301 KVLITALGCI I ISLIVGYIGGSVFKNYRLVTKQELQARN 339 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1485 

A DNA sequence (GBSxl571) was identified in S.agalactiae <SEQ ID 4569> which encodes the amino 
acid sequence <SEQ ID 4570>. This protein is predicted to be adenylosuccinate synthetase (purA). Analysis 
50 of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 0560 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB16079 GB:Z99124 adenylosuccinate synthetase [Bacillus subtilis] 
Identities = 320/427 (74%) , Positives = 378/427 (87%) 

Query: 1 MTSWWGTQWGDEGKGKITDFLSMffiWIMYQC3GD]SaGHTIVIDNKKFKLHLIPSGIF 60 

M+SWWGTQWGDEGKGKITDFLS +AEVIARYQGG+NAGHTI D +KLHLIPSGIF 
Sbjct: 1 MSSWWGTQWGDEGKGKITDFLSENAEVIARYQGGNNAGHTIKFDGITYKLHLIPSGIF 60 

Query: 61 FKEKISVIGNGVVTOPKSLVKELAYLHGEGVTTDNLRISDRAHVILPYHIKLDQLQEDAK 120 

+K+K VIGNG+W+PK+LV ELAYIiH V+TDNLR1S+RAHVILPYH+KLD+++E+ K 
Sbjct: 61 YKDKTCVIGNGMWDPKALVTEIjAYLiHERIWSTDNI^ 120 

Query: 121 GDNKIGTTIKGIGPAYMDKAARVGIRIADLLDREVFAERLKINLAEKNRLFEKMYDSTPL 180 

G NKIGTT KGIGPAYMDKAAR+GIRIADLLDR+ FAE+L+ NL EKNRL EKMY++ 
Sbjct: 121 GANKIGTTKKGIGPAYMDKAARIGIRIADLLDRDAFAEKLERNLEEKNRLLEKMYETEGF 180 

Query: 181 EFDDIFEEYYEYGQQIKQYVTDTSVILNDALDAGKRVLFEGAQGVMLDIDQGTYPFVTSS 240 

+ +DI +EYYEYGQQIK+YV DTSV+LNDALD G+RVLFEGAQGVMLDIDQGTYPFVTSS 
Sbjct: 181 KLEDILDEYYEYGQQIKKWCDTSVVLNDALDEGRRVLFKGAQGVMLDIDQGTYPFVTSS 240 

Query: 241 NPVAGGVTIGSGVGPSKINKWGVCKAYTSRVGDGPFPTELFDEVGDRIREIGKEYGTTT 300 

NPVAGGVTIGSGVGP+KI WGV KAYT+RVGDGPFPTEL DE+GD+IRE+G+EYGTTT 
Sbjct: 241 NPVAGGVTIGSGVGPTKIKHWGVSKAYTTRVGDGPFPTELKDEIGDQIREVGREYGTTT 300 

Query: 301 GRPRRVGWFDSWMRHSRRVSGITNLSLNSIDVLSGLDTVKICVAYDLDGKRIDYYPASL 360 

GRPRRVGWFDSW+RH+RRVSGIT+LSLNSIDVL+G++T+KICVAY G+ 1+ +PASL 
Sbjct: 301 GRPRRVGWFDSVWRHARRVSGITDLSLNSIDVLAGIETLKICVAYRYKGEIIEEFPASL 360 

Query: 361 EQLKRCKPIYEELPGWSEDITACRSLDDLPENARNZVRRVGEIjVGVRISTFSVGPGREQT 420 

+ L C+P+YEE+PGW+EDIT +SL +LPENAR+Y+ RV +L G+ +S FSVGP R QT 
Sbjct: 361 KALAECEPVYEEMPGWTEDlTGAKSLSELPENaRHYIiERVSQLTGIPLSIFSVGPDRSQT 420 

Query: 421 NILESVW 427 

N+L SV+ 
Sbjct: 421 NVLRSVY 427 

A related DNA sequence was identified in S.pyogenes <SEQ ID 457 1> which encodes the amino 
sequence <SEQ ID 4572>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0560 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 406/430 (94%) , Positives = 421/430 (97%) 

Query: 1 MTSVWVGTQWGDEGKGKITDFLSADAEVIARYQGGDNAGHTIVIDNKKFKLHLIPSGIF 60 

MTSWWGTQWGDEGKGKITDFLSADAEVIARYQGGDNAGHTIVID KKFKLHLIPSGIF 
Sbjct: 1 MTSWWGTQWGDEGKGKITDFLSADAEVIARYQGGDNAGHTIVIDGKKFKLHLIPSGIF 60 

Query: 61 FKEKISVIGNGWVNPKSLVKEIAYLHGEGVTTDNLRISDRAHVILPYHIICLDQLQEDAK 120 

F +KISVIGNGWVNPKSLVKEIAYLH EGVTTDNLRI SDRAHVILPYHI +LDQLQEDAK 
Sbjct: 61 FPQKISVIGNGVVVNPKSLvKELAYLHDEGVTTDNLRISDRAHVILPYHIQLDQLQEDAK 120 

Query: 121 GDNKIGTTIKGIGPAYMDKAARVGIRIADLLDREVFAERLKINLAEKNRLFEKMYDSTPL 180 

GDNKIGTTIKGIGPAYMDKAARVGIRIADLLD+++FAERL+INLAEKNRLFEKMYDSTPL 
Sbjct: 121 GDNKIGTTIKGIGPAYMDKAARVGIRIADLLDKDIFAERLRINLAEKNRLFEKMYDSTPL 180 

Query: 181 EFDDIFEEYYEYGQQIKQYVTDTSVILNDALDAGKRVLFEGAQGVMLDIDQGTYPFVTSS 240 

+FD IFEEYY YGQ+IKQYVTDTSVILNDALDAGKRVLFEGAQGVMLDIDQGTYPFVTSS 
Sbjct: 181 DFDAIFEEYYAYGQEIKQYVTDTSVILNDALDAGKRVLFEGAQGVMLDIDQGTYPFVTSS 240 

Query: 241 NPVAGGVTIGSGVGPSKINKWGVCR7AYTSRVGDGPFPTELFDEVGDRIREIGKEYGTTT 300 
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NPVAGGVTIGSGVGP+KINKOTGVCKAYTSRVGDGPFPTELFDEVG+RIRE+G EYGTTT 
Sbjct: 241 NPVAGGOTIGSGVGPNKINKWGVCKAYTSRVGDGPFPTELFDEVGERIREVGHEYGTTT 300 

Query: 301 GRPRRVGWFDSVVMRHSRRVSGITNLSI.NSIDVLSGIJDTVKICVAYDLDGKRIDYYPASL 360 
5 GRPRRVGWFDSWMRHSRRVSGITNLSIJUSIDVLSGLDTVKICVAYDLDGKRIDYYPA+L 

Sbjct: 301 GRPRRVGWFDSWMRHSRRVSGITNLSLNS IDVLSGLDTVKI CVAYDLDGKRIDYYPANL 360 

Query: 361 EQLKRCKP I YEELPGWSED I TACRSLDDLPENARNYVRRVGELVGVRI STFSVGPGREQT 420 
EQLKRCKP I YEELPGW EDIT RSLD+LPENARNYVRRVGELVGVRISTFSVGPGREQT 
10 Sbjct: 361 EQLKRCKPIYEELPGWQEDITGVRSLDELPENARNYVRRVGELVGVRI STFSVGPGREQT 420 

Query: 421 NILESVWSNI 430 

NILESVW++I 
Sbjct: 421 NILESVWASI 430 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1486 

A DNA sequence (GBSxl572) was identified in S.agalactiae <SEQ ID 4573> which encodes the amino 
20 acid sequence <SEQ ID 4574>. Analysis of this protein sequence reveals the following: 
Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.29 Transmembrane 30 - 46 ( 22 - 55) 
INTEGRAL Likelihood = -2.97 Transmembrane 110 - 126 ( 109 - 126) 
25 INTEGRAL Likelihood = -olll Transmembrane 89 - 105 ( 89 - 106) 

Final Results 

bacterial membrane Certainty=0 .4715 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8823> which encodes amino acid sequence <SEQ ID 8824> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
35 SRCFLG : 0 

McG: Length of UR: 5 

Peak Value of UR: 3.05 
Net Charge of CR: 0 
McG: Discrim Score: 4.64 
40 GvH: Signal Score (-7.5): -1.66 1 

Possible site: 36 
>» Seems to have a cleavable N-term signal seq. 
, Amino Acid Composition: calculated from 37 
■ ALOM program count: 2 value: -2.97 threshold: 0.0 
45 INTEGRAL Likelihood = -2.97 Transmembrane 100 - 116 ( 99 - 116) 

PERIPHERAL Likelihood = 1.38 56 
modified ALOM score: 1.09 
icml HYPID: 7 CFP: 0.219 

50 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2190 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database and no 
corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1487 

A DNA sequence (GBSxl573) was identified in S.agdlactiae <SEQ ID 4575> which encodes the amino 
acid sequence <SEQ ID 4576>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1488 

A DNA sequence (GBSxl574) was identified in S.agalactiae <SEQ ID 4577> which encodes the amino 
acid sequence <SEQ ID 4578>. This protein is predicted to be SgaT protein (sgaT). Analysis of this protein 
sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77150 GB:AE000491 orf , hypothetical protein [Escherichia coli K12] 
Identities = 181/451 (40%) , Positives = 274/451 (60%) , Gaps = 25/451 (5%) 

Query: 11 FSQNILQNPAFFVGLLVLIGYLLLKKPLHDVFAGFIKATVGYLILNVGAGGLVNTFRPIL 70 

F ++ N +G++ +GY+LL+K + + G IK +G+++L G+G L +TF+P++ 
Sbjct: 30 FFNQVMTNAPLLLGIVTCLGYILLRKSVSVIIKGTIKTIIGFMLLQAGSGILTSTFKPW 89 

Query: 71 VALAKKFNLFJ^VIDPYFGLASANAKIiETMG-FISVATTALLIGFGINILLVALRKVTKV 129 

+++ + + A+ D Y AS A ++ MG S A+L+ +NI V LR++T + 
Sbjct: 90 AKMSEVYGINGAI SDTY ASMI^TIDRMGDAYSWGYAVLLALALNI CYVLLRRITGI 146 



Final Results 

bacterial cytoplasm Certainty=0 . 0967 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 .4121 (Affirmative) ■= suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Query: 130 RTLFITGHIMVQQAATISVFVLLLIPQLRNGFGAWAV GIICGLYWAVSSNMTVEAT 185 

RT+ +TGHIM QQA I+V + + G+ W 1+ LYW ++SNM + T 

Sbjct: 147 RTIMLTGHIMFQQAGLIAVTLFIF GYSMWTTIICTAILVSLYWGITSNMMYKPT 200 
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Query: 186 QRLTGGGGFAIGHQQQFAIWFVDKVAPFFGKKEENLDNLKLPTFLNIFHDTWASATLML 245 

Q +T G GF+IGHQQQFA W KVAPF GKKEE++++LKLP +LNIFHD +V++A +M 
Sbjct: 201 QEVTDGCGFSIGHQQQFASWIAYKVAPFLGKKEESVEDLKLPGWLNIFHDNIVSTAIVMT 260 

Query: 246 VFFGGIIAVLGPDIMSNVKLIGPGAFVPTKQAFFMYILQTSLTFSVYLFILMQGVRMFVT 305 

+FFG IL G D + + K + +YILQT +F+V +FI+ QGVRMFV 

Sbjct: 261 IFFGAILLSFGIDTVQ --AMAGKVHWTVYILQTGFSFAVAIFIITQGVRMFVA 311 

Query: 306 ELTNAFQGISNKLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITIALLWFKNPI 365 

EL+ AF GIS +L+PG+ A+D AA Y F + NAV+ GF +G IGQLI + +LV + I 
Sbjct: 312 ELSEAFNGISQRIiI PGAVLAIDCAAI YS F - APNAWWGFMWGTIGQLIAVGILVACGSS I 370 

Query: 366 LI ITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGI IQVALGAVAVGLLGIAGGYHGN 425 

LII GF+P+FF NA I V+A+ GGW+AA+ + + G+I++ AV L G++ + G 

Sbjct: 371 LIIPGFIPMFFSNATIGVFANHFGGWRAALKICLVMGMIEIFGCVWAVKLTGMS-AWMGM 429 

Query: 426 IDFEFPWLAFGYIFKYLGIAGYVIVCLFFLA 456 

D+ F +GIA ++ + LA 

Sbjct: 43 0 ADWSILAPPMMQGFFSIGIAFMAVT IVIALA 460 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4579> which encodes the amino 

sequence <SEQ ID 4580>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>>> Seems to have no N-terminal signal sequence 
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Likelihood 




-5. 


.63 


Transmembrane 


105 


- 121 


( 


100 - 


127) 


INTEGRAL 


Likelihood 




-5. 


.52 


Transmembrane 


138 


- 154 


( 


137 - 


155) 


INTEGRAL 


Likelihood 




-5. 


,20 


Transmembrane 


400 


- 416. 


(. 


392 - 


422) 


INTEGRAL 


Likelihood 




-4, 


.78 


Transmembrane 


18 


- 34 


( 


14 - 


39) 


INTEGRAL 


Likelihood 




-2 


.97 


Transmembrane 


365 


- 381 


( 


365 - 


383) 


INTEGRAL 


Likelihood 




-1. 


.49 


Transmembrane 


160 


- 176 


( 


160 - 


177) 


INTEGRAL 


Likelihood 




-0 


.53 


Transmembrane 


41 


- 57 


( 


41 - 


57) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0 . 5203 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC77150 GB-.AE000491 orf , hypothetical protein [Escherichia 
coli] 

. Identities = 182/461 (39%) , Positives = 279/461 (60%) , Gaps = 25/461 (5%) 

Query: 1 MEMLLAPLNWFSQNILQNPAFFVGLLVLIGYLLLKKPIYEVFAGFVKATVGYLILNVGAG 60 

ME+L F ++ N +G++ +GY+LL+K + + G +K +G+++L G+G 

Sbjct: 20 MEILYNIFTVFFNQVMTNAPLLLGIVTCLGYILLRKSVSVIIKGTIKTIIGFMLLQAGSG 79 

Query: 61 GLWTFRPILVALAKKFELKAAVIDPYFGLAAftNTKLEEMG-FISVATTALLIGFGVNIL 119 

L +TF+P++ +++ + + A+ D Y + A ++ MG S A+L+ +NI 
Sbjct: 80 I LTSTFKPWAKMSE VYGINGAI SDTYASMMAT 1 DRMGDAYSWVGYA VLLALALNI C 136 

Query: 120 LVALRKOTKVRTLFITGHIIWQQAATISVTVLiLLIPQFQNAFGAWAV GIICGLYWA 175 

V LR++T +RT+ +TGHIM QQA I+V + + + W 1+ LYW 

Sbjct: 137 YVLLRRITGIRTIMLTGHIMFQQAGLIAVTLFIF GYSMWTTI I CTAI LVSLYWG 190 

Query: 176 ISSNMTVEATQRLTGGGGFAIGHQQQFAIWFVDCT7APFFGKKEENLDNLKLPTFLNIFHD 235 

I+SNM + TQ +T G GF+IGHQQQFA W KVAPF GKKEE++++LKLP +LNIFHD 
Sbjct: 191 ITSNMMYKPTQEVTDGCGFSIGHQQQFASWIAYKVAPFLGKKEESVEDLKLPGWLNIFHD 250 



Query: 236 TWASATLMLVFFGAILAVLGPDIMSDvTJLIGPGAFNPAKQAFFMYILQTSLTFSVYLFI 295 

+V++A +M +FFGAIL G D + + K + +YILQT +F+V +FI 

Sbjct: 251 NIVSTAIVMTIFFGAILLSFGIDTVQAM AGKVHWTVYILQTGFSFAVAIFI 301 
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Query: 296 LMQGVRMFVSELTNAFQGISSKLLPGSFPAVDVaASYGFGSSNAVLSGFAFGLIGQLITI 355 

+ QGVRMFV+EL+ AF GIS +L+PG+ A+D SAY F + NAV+ GF +G IGQLI + 
Sbjct: 302 ITQGVRMFVAELSFAFNGISQRLIPGAVLA.IDCAAIYSF-APNAWWGFMWGTIGQLIAV 360 

5 

Query: 356 ALLVIFKNPILIITGFVPVFFDNAAIAWADKRGGWKfiAVALSFISGILQVALGAVAVGL 415 

+LV + ILII GF+P+FF NA I V+A+ GGW+AA+ + + G++++ AV L 

Sbjct: 361 GILVACGSSILIIPGFIPMFFSNATIGVFAlfflFGGPJRAALKICLVMGMIEIFGCVWAVKL 420 

10 Query: 416 LGLTGGYHGNIDLVLPWLPFGYLFKFLGIAGYVLVCIFLLA 456 

G++ + G D + P F +GIA ++ + LA 
Sbjct: 421 TGMS - AWMGMADWS I LAPPMMQGFFSIGIAFMAVI IVIALA 460 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 437/476 (91%) , Positives = 457/476 (95%) 

Query: 1 MENFLAPLNWFSQNILQNPAFFVGLLVLIGYLLLKKPLHDVFAGFIKATVGYLIM3VGAG 60 

ME ^Pr™FSQNILQNPAFFVGLLVLIGYLLLKKP+++VFAGF+KATVGYLILNVGAG 
Sbjct: 1 MEMLlAPLNWFSQNILQNPAFFVGLLVLIGYLLLKKPIYEVFAGFVKATVGYLIIiNVGAG 60 

20 

Query: 61 GLVNTFRPILVArAKKFNLEAA.VIDPYFGLASANAKLETMGFISVATTALLIGFGINILL 120 

GLV TFRPILVALAKKF L+AAVIDPYFGLA+AN KLE MGFISVATTALLIGFG+NILL 
Sbjct: 61 GLWTFRPILVAIAKKFELKAAVIDPYFGLAAANTKLEEMGFISVATTALLIGFGVNILL 120 

25 Query: 121 VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQLRNGFGAWAVGIICGLYWAVSSNM 180 

VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQ +N FGAWAVGIICGLYWA+SSNM 
Sbjct: 121 VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQFQNAFGAWAVGIICGLYWAISSNM 180 

Query: 181 TVFATQRLTGGGGFAIGHQQQFAIWFVDKVMFFGKKEENLDNLKLPTFLNIFHDTWAS 240 
30 , TVEATQRLTGGGGFAIGHQQQFAIWFVDKVAPFFGKKEENLDNLKLPTFLNIFHDTWAS 

■ Sbjct: 181 TVEATQRLTGGGGFAIGHQQQFAIWFVDKVAPFFGKKEENLn^KLPTFLNIFHDTVVAS 240 

Query: 241 ATLMLVFFGGIIAVLGPDIMSNVKLIGPGAFVPTKQAFFMYILQTSLTFSVYLFILMQGV 300 
ATLMLVFFG ILAVLGPDIMS+V LIGPGAF P KQAFFMYILQTSLTFSVYLFILMQGV 
35 Sbjct: 241 ATLMLVFFGAIIAVliGPDIMSDvDLIGPGAENPARQAFFIWILQTSLTFSVYLFILMQGV 300 

Query: 301 RMFVTELTNAFQGISNKLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITIALLW 360 

RMFV+ELTNAFQGIS+KLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITIALLV+ 
Sbjct: 301 RMFVSELTNAFQGISSKLIjPGSFPAvDVAASYGFGSSNAVLSGFAFGLIGQLITIALLVI 360 

40 

Query: 361 FKNPILIITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGIIQVALGAVAVGLLGLAG 420 

FKNPILIITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGI+QVALGAVAVGLLGL G 
Sbjct: 361 FKNPILIITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGILQVALGAVAVGLLGLTG 420 

45 Query: 421 GYHGNIDFEFPWIAFGYIFKYLGIAGYVIVCLFFLAIPQLQFMKSKDKEAYYRGDA 476 

GYHGNID PWL FGY+FK+LGIAGYV+VC+F IAIPQLQF K+KDKEAYYRG+A 
Sbjct: 421 GYHGNIDLVLPWLPFGYLFKFLGIAGYVLVCIFLLAIPQLQFAKAKDJCEAYYRGEA 476 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1489 

A DNA sequence (GBSxl575) was identified in S.agalactiae <SEQ ID 4581> which encodes the amino 
acid sequence <SEQ ID 4582>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
55 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1225 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG34743 GB:AE000033 similar to PTS system: EIIB [Mycoplasma pneumoniae] 
Identities = 40/89 (44%), Positives = 62/89 (68%), Gaps = 1/89 (1%) 

5 Query: 4 VLTACGNGMGSSIWIKMKVENALRQLGVSNFESASCSVGEAKGLAANYDIWASNHLIHE 63 

++ ACGNGMG+SM+IK+KVE +++IK3 + A S+G+ KG+ + DI+++S HL E 
Sbjct: 8 I IAACGNGMGTSMLI KI KVEKIMKELGYTAKVEA- LSMGQTKGMEHSADI I I SS IHLTSE 66 

Query: 64 LDGRTKGHLVGLDNLMDDNEI KTKLQE I L 92 
10 + K +VG+ NLMD+NEIK L ++L 

Sbjct: 67 FNPNAKAKI VGVLMjMDENE I KQALS KVL 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4583> which encodes the amino acid 

sequence <SEQ ID 4584>. Analysis of this protein sequence reveals the following: 

15 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0977 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 85/92 (92%) , Positives = 90/92 (97%) 

25 

Query: 1 WKVLTACGNGMGSSMVIKMKVENALRQLGVSNFESASCSVGEAKGLAANYDIWASNHL 60 

MVKVLTACGNGMGSSMVIKMKVENALRQLGV++ +SASCSVGEAKGLA+ YDIWASNHL 
Sbjct: 1 MvKVLTACGNGMGSSMVIKMKvENAIiRQLGVTDIQSASCSVGEAKGLASGYDIWASNHL 60 

30 Query: 61 IHELDGRTKGHLVGLDNLMDDNEIKTKLQEIL 92 

IHELDGRTKGHLVGLDNLMDDNEIKTKLQE+L 
Sbjct: 61 IHELDGRTKGHLVGLDNLMDDNEIKTKLQEVL 92 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1490 

A DNA sequence (GBSxl576) was identified in S.agalactiae <SEQ ID 4585> which encodes the amino 
acid sequence <SEQ ID 4586>. This protein is predicted to be a pentitol phosphotransferase enzyme ii, a 
component (ptxA). Analysis of this protein sequence reveals the following: 

40 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3309 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77152 GB:AE000491 putative PTS system enzyme II A component 
50 [Escherichia coli K12] 

Identities = 64/150 (42%) , Positives = 97/150 (64%) , Gaps = 2/150 (1%) 

Query: 1 MNLKQAFIENDSIRLKLSASDWKEAIKLSIDPLIESGAVDAEYYDAIIESTEEFGPYYIL 60 
M L+ + EN SIRL+ A W+EA+K+ +D L+ + V+ YY AI++ E+FGPY+++ 
55 Sbjct: 1 MKlRDSLAENKSIRLQAEAETWQEAVKIGvDLLvAftDVVEPRYYQAILDGVEQFGPYFVI 60 

Query: 61 MPGMAMPHARPEAGVKRDAFSLITIVTEPWF — PDGKEVSVLLALAATSSAIHTSVAIPQ 118 
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PG+AMPH RPE GVK+ FSL+TL +P+ F D V +L+ +AA + H V I Q 
Sbjct: 61 APGLflMPHGRPEEGVKKTGFSLVTLKKPLEFOTDDNDPVDILITMAAVDANTHQEVGIMQ 120 

Query: 119 IIALFELENSIQRLTECQEAKEVLAMVEES 148 
5 1+ LFE E + RL C+ +EVL +++ + 

Sbjct: 121 IVNLFEDEENFDRLRACRTEQEVLDLIDRT 150 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4587> which encodes the amino acid 
sequence <SEQ ID 4588>. Analysis of this protein sequence reveals the following: 

10 Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2287 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/161 (70%) , Positives = 137/161 (84%) 

20 

Query: 1 MNLKQAFIENDSIRLKLSASDWKEAIKLSIDPLIESGAVDAEYyDAIIESTEEFGPYYIL 60 

MNLKQAFI+N+SIRL LSA W+EA++L++ PLI+S AV + YYDAII STE++GPYY+L 
Sbjct: 1 MNLKOAFIDNNSIRLGLSADTWQEAWIAVQPLIDSKAVTSAYYDAIIASTEKVGPYYVL 60 

25 Query: 61 MPGMAMPHARPFAGWRDAFSIjITLTEPvVFPDGKEVSVLLALaATSSAIHTSVAIPQII 120 

MPGMAMPHA GV R+AF+MTLT+PV F DGKEVSVLL LAAT + IHT+VAIPQI+ 
Sbjct: 61 MPGMAMPHAEAGLGVNRNAFALITLTKPVTFSDGKEVSVLLTLAATDPS IHTTVAI PQI V 120 

Query: 121 ALFELENSIQRLTECQEAKEVLAMVEESKNSPYLEGLDLES 161 
30 ALFEL+N+I+RL CQ KEVL MVEESK+SPYLEG+DL + 

Sbjct: 121 ALFELDNAIERLVACQSPKEVLEMVEESKDSPYLEGMDLNA 161 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1491 

A DNA sequence (GBSxl577) was identified in S.agalactiae <SEQ ID 4589> which encodes the amino 
acid sequence <SEQ ID 4590>. This protein is predicted to be probable hexulose-6-phosphate synthase. 
Analysis of this protein sequence reveals the following: 

Possible site: 19 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1584 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77153 GB:AE000491 probable hexulose- 6 -phosphate synthase 
[Escherichia coli K12] 
50 Identities = 108/217 (49%) , Positives = 141/217 (64%) , Gaps = 3/217 (1%) 

, Query: 5 LPNLQVALDHSDLQGAI K^VSVGHEVD VTEAGTVCLLQVGSELVE VLRSLFPDKI I VAD 64 
LP LQVALD+ + A + + EVD+IE GT+ + G V L++L+P KI++AD 
Sbjct: 3 LPMLQVALDNQTMDSAYETTRLIAEEVDIIEVGTILOTGEGVRAVRDLKALYPHKIVLAD 62 



55 



Query: 65 TKCADAGGWAKNNAWGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDWTY 124 

K ADAG +++ ADW+T ICCA I T + AL KE GD +QIEL G WT+ 
Sbjct: 63 AKIADAGKILSRMCFEANADWVWICQffilNTAKGRLDVAKEFNGD VQIELTGYWTW 119 



WO 02/34771 



-1649- 



PCT/GB01/04789 



Query: 125 EQAQQWLDAGISQAIYHQSRDALLAGETWGEKIJIiNKVKKLIDMGFRVSVTGGLSTDTLQL 184 
EQAQQW DAGI Q +YH+SRDA AG WGE D+ +K+L DMGF+V+VTGGL+ + L L 
^ Sbjct: 120 EQAQQWRDAGIGQVVYHRSRDAQAAGVAWGEADITAIKRLSDMGFKOTVTGGLALEDLPL 179 

Query: 185 FEGVDVFTFIAGRGITEADDPAAAARAFKDEIKRIWG 221 

F+G+ + FIAGR I +A P AAR FK I +WG 
Sbjct: 180 FKGIPIHVFIAGRSIRDAASPVEAARQFKRSIAELWG 216 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 459 1> which encodes the amino acid 
sequence <SEQ ID 4592>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 1473 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below. 

Identities = 206/217 (94%) , Positives = 212/217 (96%) 

Query: 5 LPNLQVALDHSDLQGAIKAAVSVGHEvDVIEAGTVCLLQVGSELVEVLRSLFPDKIIVAD 64 
+PNLQVALDHSDLQGA+KAAV+VGHEVDVIEAGTVCLLQVGSELVEVLRSLFP+KIIVAD 
25 Sbjct: 4 IPNLQVALDHSDLQGAVKAAVAVGHKVDVIEAGTVCLLQVGSELVEVLRSLFPEKIIVAD 63 

Query: 65 TKCADAGGWAKNNAVRGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDWTY 124 

TKCADAGGTVAKNNA RGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDWTY 
Sbjct: 64 TKQUDAGGWAKNNAKRGADWMTCICCATIPTMEaALKAIKEERGDRGEIQIELYGDWTY 123 

30 

Query: 125 EQAQQWLDAGISQAIYHQSRDALLAGETWGEKDLNKVKKLIDMGFRVSVTGGLSTDTLQL 184 

EQAQ WLDAGISQAIYHQSRDABLAGETWGEKDLNKVK LIDMGFRVSVTGGL DTL+L 
Sbjct: 124 EQAQLWLDAG I SQAI YHQSRDALLAGETWGE KDLNKVKTLIDMGFRVS VTGGLDVDTLRL 183 

35 Query: 185 FEGVDVFTFIAGRGITEADDPAAAARAFKDEIKRIWG 221 

FEGVDVFTFIAGRGITEA+DPAAAARAFKDEIKRIWG 
Sbjct: 184 FEGVDVFTFIAGRGITEAEDPAAAARAFKDEIKRIWG 220 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1492 

A DNA sequence (GBSxl578) was identified in S.agalactiae <SEQ ID 4593> which encodes the amino 
acid sequence <SEQ ID 4594>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22686 GB:U32783 hexulose- 6 -phosphate isomerase, putative 
[Haemophilus influenzae Rd] 
55 Identities = 143/282 (50%) , Positives = 199/282 (69%) , Gaps = 3/282 (1%) 

Query: 5 IGIYEKATPKHFNWLERLQFAKELGFDFVELSIDESDERLARLEWSKEERLELVKAIFET 64 

IGIYEKA PK+ W ERL AK GF+F+E+SIDES++RL+RL W+K ER+ L ++I ++ 
Sbjct: 6 IGIYEKALPKNITWQERLSLAKACGFEFIEMSIDESNDRLSRLNWTKSERIALHQS HQS 65 
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Query: 65 GWVPTITFSGHRRFPMGSl^PEKEARAMDMMKKCIVFAQDIGIRNIQLAGYDVYYEEKS 124 

G+ +P++ S HRRFP GS + + ++ ++M+K I + ++GIR IQLAGYDVYYE++ 
Sbjct: 66 GITIPSMCLSAHRRFPFGSKDKKIRQKSFEIMEKA.IDLSVWLGIRTIQLAGYDVYYEKQD 125 

5 

Query: 125 PETRARFIKNLRQACTWAEEAQVILSIEimDPFMNSIEKYLAVEKEIDSPYLFVYPDTG 184 

ET F + + A T A AQV L++EIMD PFM+SI ++ + I+SP+ VYPD G 
Sbjct: 126 EETIKYFQEGIEFAVTLAASAQVTLAVEIMDTPFMSSISRWKKWDTIINSPWFTVYPDIG 185 

10 Query: 185 NVSAWHMDLWSEFYNGHRSIAADHIKDTYAVTETSKGQFRDVPFGQGCVDWEEMFAVIKK 244 

N+SAW+N++ E G I+A+H+KDTY VTETSKGQFRDVPFGQGCVD+ F+++KK 
Sbjct: 186 NLSAWNNNIEEELTLGIDKISAIHLKDTYPVTETSKGQFRDVPFGQGCVDFVHFFSLLKK 245 

Query: 245 TNYNGPFLIEMWSENCETVEETRAAIKEAQDFLYPLMEKTGV 286 
15 NY G FLIEMW+E EE I +A+ ++ MEK G+ 

Sbjct: 246 IiNYRGAFLIEMWTEK NEEPLLEI IQARKWIVQQMEKAGL 284 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4595> which encodes the amino acid 
sequence <SEQ ID 4596>. Analysis of this protein sequence reveals the following: 

20 Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1489 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



30 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 240/286 (83%), Positives = 271/286 (93%) 

Query: 1 MTRPIGIYEKATPKHFNWLERLQFAKELGFDFVELSIDESDERLARLEWSKEERLELVKA 60 

M RPIGIYEKATPK F W ERLQFAK+LGFDFVE+S+DESD RLARLEW+KEERL+LVKA 
Sbjct: 15 MARPIGIYEKATPKQFTWRERLQFAKDLGFDFTOMSVDESDARIARIiEWTKEERLDLVKA 74 

35 Query: 61 IFETGVRVPTITFSGHRRFPMGSNNPEKEARAMDMMKKCIVFAQDIGIRNIQLAGYDVYY 120 

I+ETG+R+PTI FSGHRR+P+GSN+P EA+++ +MK+CI AQD+G+R IQLAGYDVYY 
Sbjct: 75 IYETGIRIPTICFSGHRRYPLGSNDPAIEAKSLKLMKQCIELAQDLGVRTIQLAGYDVYY 134 

Query: 121 EEKSPETRARFIKNLRQACTWAEEAQVILSIEIMDDPFMNSIEKYLAVEKEIDSPYLFVY 180 
40 E+KSPETRARFIKNLRQ+C WAEEAQV+LSIEIMDDPF+NSIEKYLAVEKEIDSPYLFVY 

Sbjct: 135 EKKSPETRARFIKNLRQSCDWAEEAQVMLSIEIMDDPFINSIEKYLAVEKEIDSPYLFVY 194 

Query: 181 PDTGNVSAWHNDLWSEFYNGHRSIAALHIKDTYAVTETSKGQFRDVPFGQGCVDWEEMFA 240 
PD GNVSAWHNDLWSEFYNGH+SIAALH+KDTYAVTETSKGQFRDVPFGQGCVDW+E+FA 
45 Sbjct: 195 PDAGNVSAWHNDLWSEFYNGHKSIAALHLKDTYAVTETSKGQFRDVPFGQGCVDWQELFA 254 

Query: 241 VIKJCTNYNGPFLIEMWSENCETVEETRAAIKEAQDFIiYPLMEKTGV 286 

V+KKTNYNGPFLIEMWSENC+TVEET+AAIKEAQDFLYPL+EK G+ 
Sbjct: 255 vLKKraYNGPFLIE^lWSENCDTVEETKAAIKEAQDFLYPLIEKAGL 300 

50 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1493 

A DNA sequence (GBSxl579) was identified in S.agalactiae <SEQ ID 4597> which encodes the amino 
55 acid sequence <SEQ ID 4598>. This protein is predicted to be L-ribulose 5-phosphate 4-epimerase. 
Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty^O . 2559 (Affirmative) < suco 

bacterial membrane Certainty= 0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45716 GB:AF160811 L-ribulose 5-phosphate 4-epimerase 
[Bacillus stearothermophilus] 
Identities = 143/229 (62%) , Positives = 176/229 (76%) , Gaps = 2/229 (0%) 

LQEMRERVCEANKSLPVHSLVKFTWGNVSEVDREAGLIVIKPSGVDYDQLTPENMVVTDL 64 
L+E+++ V EAN LP + LV FTWGNVS +DRE GL+VIKPSGV YD+LT ++MW DL 
LEELKQAVLEANLQLPQYRLVTFTWGNVSGIDRERGLWIKPSGVAYDKLTIDDMVVVDL 61 

EGNIVEGDLNPSSDLPTHVQLYKAWPEVGGIVHTHSTEAVGWAQAGRDIPFYGTTHADYF 124 

GN+VEGDL PSSD PTH+ LYK +P +GGIVHTHST A WAQAG+ IP GTTHADYF 
TGNWEGDLKPSSDTPTHLWLYKQFPGIGGIVHTHSTWATVWAQAGKGI PALGTTHADYF 121 



YG +PC R ++ +E+ AYE ETG VI E F R LDP+ +PG++V HGPF WGKDPA 



AV+++WLEEVAKM T +NP +P + ++D+HYLRKHG NAYYGQ 
A VHNA WLEEVAKMAARTYMLNPNAKP I SQTLIiDRHYLRKHGANAYYGQ 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4599> which encodes the amino acid 

sequence <SEQ ID 4600>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
30 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2257 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/234 (88%) , Positives = 220/234 (93%) 

40 Query: 1 MAKSLQE^ERVCEANKSLPVHSLVKFTWGNVSEVDREAGLIVIKPSGVDYDQLTPENMV 60 

MAK+LQEMRERVC ANKSLP H LVKFTWGNVSEV RE G IVIKPSGVDYD LTPENMV 
mKNLQEMRERVCAANKSLPQHGLVKFTWGNVSEVCRELGRIVIKPSGVDYDLLTPENMV 6 0 

VTDLEGNIVEGDLNPSSDLPTHVQLYKAWPEVGGIVHTHSTEAVGWAQAGRDIPFYGTTH 120 
45 VTDL+GN+VEGDIuNPSSDLKTHV+IjYKAWPEVGGIVHTHSTEAVGWAOAGRDIPFYGTTH 





Query: 


5 




Sb j ct : 


2 


15 


Query: 


65 




Sb j ct : 


62 


20 


Query: 


125 




Sbjct: 


122 




Query: 


185 


25 


Sb j ct : 


180 



50 



55 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j Ct : 


121 


Query: 


181 


Sbjct: 


181 



ADYFYGPVPCARSL++ EV+ AYE+ETG+VI+EEF +R LDPMAVPGIWRNHGPFTWGK 
ADYFYGPVPCARSLTKAEVDGAYEQETGNVILEEFSKRGLDPMAVPGIWRNHGPFTWGK 180 

DPAQAVYHSvVLEEVAKMNRFTEQINPRVEPAPKYIMDKHYLRICHGPNAYYGQK 234 

P QAVYHSWLEEVA+MNR TEQINPRVEPAP+YIMDKHYLRKHGPNAYYGQK 
TPEQAVYHSVVLEEVARMNRLTEQINPRVEPAPRYIMDKHYLRKHGPNAYYGQK 234 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1494 

A DNA sequence (GBSxl580) was identified in S.agalactiae <SEQ ID 4601> which encodes the amino 
acid sequence <SEQ ID 4602>. This protein is predicted to be transaldolase (tal). Analysis of this protein 
sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4232 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10149> which encodes amino acid sequence <SEQ ID 
1 0 1 5 0> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB98962 GB:U67539 transaldolase [Methanococcus jannaschii] 
Identities = 124/214 (57%) , Positives = 157/214 (72%) 

MKYFLDTADVSEIRRLNRLGIVDGVTTNPTIISREGRDFKEVINEICQI VDGPVSAEVTG 78 
MK+FLDTA+V EI++ LG+VDGVTTNPT++++EGRDF EV+ EIC+IV+GPVSAEV 
MKFFLDTANVEEIKKYAELGLVDGVTTNPTLVAKEGRDFYEWKEICEIVEGPVSAEVIS 60 

LTCDEMVTEARElAKWSPNVWKIPMTEEGIAAVSQLSKEGIKTim'LIFTVAQGLSAMK 138 
+ MV EARE+AK + N+V+KIPMT++G+ AV LS EGIKTNVTL+F+ Q L A K 



AGAT++SPFVGRL+DIG LI D+ I Y ++E+I AS+R HV AK GA 



IAT+P LF HPLTD G+E FLKDWD + K 

IATMPPAVMDKLFNHPLTDIGLERFLKDWDEYLK 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4603> which encodes the amino acid 
sequence <SEQ ID 4604>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1902 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 162/214 (75%) , Positives = 180/214 (83%) 

Query: 19 MKYFLDTADVSEIRRLNRLGIvDGVTTNPTIISREGRDFKEVINEICQI VDGPVSAEVTG 78 

MK+FLDTA+V+ 1+ +N LG+VDGVTTNPTIISREGRDF+ VI EIC IVDGP+SAEVTG 
Sbjct: 1 MKFFLDTANVAAIKAINELGVVDGVTTNPTIISREGRDFETVIKEICDIVDGPISAEVTG 60 

Query: 79 LTCDEMVTEARE IAKWSPNVWKI PMTEEGLAAVSQLSKEGI KTNVTLI FTVAQGLSAMK 138 

LT D MV EAR IAKW NVWKIPMT EGL A + LSKEGIKTNVTLIFTV+QGL AMK 
Sbjct: 61 LTADA1WEEARSIAKWHDNVVTOIPMTTEGLKATNILSKEGIKTNVTLIFTVSCJ3LMAMK 120 

Query: 139 AGATFISPFVGRLEDIGTDAYALIRDLRHIIDFYGFQSEIIAASIRGLAHVEGVAKCGAH 198 

AGAT+ISPF+GRLEDIGTDAY LI DLR I ID Y FQ+EIIAASIR AHVE VAK GAH 
Sbjct: 121 AGATYISPFIGRLEDIGTDAYQLISDLREIIDLYDFOAEIIAASIRTTAHVEAVAKLGAH 180 



Query: 


19 


Sbjct: 


1 


Query: 


79 


Sbjct: 


61 


Query: 


139 


Sbjct: 


121 


Query: 


199 


Sbjct: 


181 
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Query: 199 IATIPDKTFASLFTHPLTDKGIETFLKDWDSFKK 232 

IATIPD FA + HPLT G++TF++DW SFKK 
Sbjct: 181 IATI PDPLFAKMTQHPLTTNGLKTFMEDWASFKK 214 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1495 

A DNA sequence (GBSxl581) was identified in S.agalactiae <SEQ ID 4605> which encodes the amino 
acid sequence <SEQ ID 4606>. Analysis of this protein sequence reveals the following: 

10 Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1263 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14129 GB-.Z99115 transcriptional regulator (Lad family) 
20 [Bacillus subtilis] 

Identities = 108/331 (32%) , Positives = 188/331 (56%) , Gaps = 12/331 (3%) 

Query: 6 TI SD I ATO^VGVSKAWS YYLNGNYKKMSLQTKEKIRliAI KETGYQPS KIAQSLVTKNTRT 65 
TI D+A GVSK+TVS Y+NG +S + + 1+ AI E Y+PSK+AQ L K ++ 
25 Sbjct: 10 TIKDVAECAGVSKSTVSRYINGKIDAISPEKVKNIKKAIAEl^RPSKMAQGLKIKKSKL 69 

Query: 66 IGWIADITNPFISSVMKGIHDTQMFGYSTOFINSD1TO 125 

IG V+ADITNPF + +G+ + C Q+GYS+ N+DN + E E L +L +V G+IL 
Sbjct: 70 IGFVWADITNPFSVA&FRGVEEVCBQYGYSIMVCNTDNSPEKEREMLLKLEAHSVEGIiIIj 129 

30 

Query:. 126 DS VDPNHSF I ETLSNDRL - - VMVDRQAKD I KVDTVASDNKESTQI FLEKMQEAGYHD I YF 183 

++ N + + ++ +++DR+ D+K+DTV +DN+ T+ L+K+ GY D+ 
Sbjct: 130 NATGENKDVLRAFAEQQIPTILIDRK1PDLK1DTVTTDNRWITKEILQKVYSKGYTDVAL 189 

35 Query: 184 VTYPIEGISTRELRYEGFKEWS-SNPDKLIIITE-DGSTQRILDI IEHSEQKP 235 

T PI IS R R ++E+ SN + L+ + ED + L E EQK 

Sbjct: 190 FTEPISSISPRAERAAVYQEMASVQNWGLVRLHEIDVKDKEQLKAELRSFHKEMPEQKK 249 

Query: 236 GFLMMNGPTLLNFMKKLNQSTVSYPEDYGLGSYEDIiEWMQVLTPNVSCIKQDSYGIGCIjA 295 
40 L +NG +L + + + + P+D G+ ++D EW +++ P ++ I Q S+ +G A" 

Sbjct: 250 AHiALNGLIMLKIISCMEELGLRIPQDIGIAGFDDTEJTYKLIGPGITTIAQPSHDMGRTA 309 

Query: 296 AQCLIEKI SQGNEPTTARLLEVKNQIVIRQS 326 
+ ++++I + + +E++ ++++R+S 
45 Sbjct: 310 MERVLKRIE- -GDKGAPQTIELEAKVIMRKS 338 

There is also homology to SEQ ID 2366. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 Example 1496 

A DNA sequence (GBSxl582) was identified in S.agalactiae <SEQ ID 4607> which encodes the amino 
acid sequence <SEQ ID 4608>. Analysis of this protein sequence reveals the following: 



55 



Possible site: 40 

»> Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm Certainty=0 . 1661 (Af f irmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiixty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1497 

A DNA sequence (GBSxl583) was identified in S.agalactiae <SEQ ID 4609> which encodes the amino 
acid sequence <SEQ ID 4610>. This protein is predicted to be GLYCERATE DEHYDROGENASE. 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB50351 GB-.AJ248287 GLYCERATE DEHYDROGENASE [Pyrococcus abyssi] 
Identities = 123/325 (37%) , Positives = 192/325 (58%) , Gaps = 8/325 (2%) 



Query: 


1 


MDKKKILVTGIVPKEGLRKLMDRFDVTYSED-RPFSRDYVLEHLSEYDGWLLM-GQKGDK 


58 






M K ++ +T +P+ G+ L F+V ED R R+ +LE + + D + M ++ D+ 




Sbjct: 


1 


MSKPRVFITREIPEVGIEMLEKEFEvEVWEDEREIPREILLEKVKDVDALVTMLSERIDR 


60 


Query: 


59 


EMIDAGENLQIISLNAVGFDHVDTAYAKEKGIIVSNSPQAVRVPTAEMTFALILAASKRL 


118 






E+ + L+I++ AVG+D++D A ++GI V+N+P + TA++ FAL+LA ++ L 




Sb j ct : 


61 


EVFERAPRLR1 VANYAVGYDNI DVEEATKRG I YVTNTPG VLTDATADLAFALLLATARHL 


120 


Query: 


119 


AFYDS IVRSGEW IDPSEQRYQGLTLQGSTLGI YGMGRIGLTVANFAKAFGMTWYN 


174 






D RSGEW + + + G + G T+GI G GRIG +A A+ F M ++Y 




Sb j Ct : 


121 


VKGDKFTRSGEWKKRGVAWHPKWFLGYDVYGKTIGIIGFGRIGQAIAICRARGFDMRILYY 


180 


Query: 


175 


DVYRLPEDKEKELGVTYLEFDQDIKTADVITIHAPALPSTIHKFNKDVFAKMKNRSYLIN 


234 






R PE EKED + D+L++ +D + + P T H N++ MK + LIN 




Sb j ct : 


181 


SRTRKPE-VEKELNAEFKPLDELLRESDFVvIAVPLNKETYHMINEERLKMMKRTAILIN 


239 


Query: 


235 


AARGPIVSEEALIEALKEGEIAGAGLDVFENEPQVSEGLRSLDNVIMSPHAGTGTIEGRR 


294 






ARG ++ +ALI+ALKEG IAGAGLDV+E EP +E L SLDNV+++PH G+ T R 




Sb j ct : 


240 


VARGKVIDTKALIKALKEGWIAGAGLDVYEEEPYYNEELFSLDNWLTPHIGSATFGARE 


299 


Query: 


295 


TLAEEAADNI IAFFDGK- PQNIVNK 318 








+A+ A+N+IAF G+ P +VN+ 




Sbjct: 


300 


GMAKLVAENLIAFKRGEVPPTLVNR 324 





There is also homology to SEQ ID 124. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1498 

A DNA sequence (GBSxl585) was identified in S.agalactiae <SEQ ID 461 1> which encodes the amino 
acid sequence <SEQ ID 4612>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1898 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1499 

A DNA sequence (GBSxl586) was identified in S.agalactiae <SEQ ID 4613> which encodes the amino 
acid sequence <SEQ ID 4614>. This protein is predicted to be PTS system, galactitol specific IIC 
component. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•13.27 


Transmembrane 


254 


- 270 


( 


245 


- 277) 


INTEGRAL 


Likelihood 




-9.24 


Transmembrane 


77 


- 93 


( 


71 


- 100) 


INTEGRAL 


Likelihood 




-9.24 


Transmembrane 


367 


- 383 


( 


364 


- 386) 


INTEGRAL 


Likelihood 




-8.28 


Transmembrane 


32 


- 48 


( 


26 


- 54) 


INTEGRAL 


Likelihood 




-7.38 


Transmembrane 


186 


- 202 


( 


182 


- 215) 


INTEGRAL 


Likelihood 


- 


-6.26 


Transmembrane 


158 


- 174 


( 


151 


- 180) 


INTEGRAL 


Likelihood 




-5.79 


Transmembrane 


279 


- 295 


( 


276 


- 296) 


INTEGRAL 


Likelihood 




-1.12 


Transmembrane 


342 


- 358 


( 


342 


- 359) 


INTEGRAL 


Likelihood 




-0.00 


Transmembrane 


308 


- 324 


( 


308 


- 324) 



Final Results 

bacterial membrane Certainty=0 . 6307 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8825> which encodes amino acid sequence <SEQ ID 8826> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
MCG: Discrim Score: 8.30 
GvH: Signal Score (-7.5): 2.97 

Possible site: 58 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 9 value: -13.27 threshold: 0.0 



INTEGRAL 


Likelihood 




•13. 


.27 


Transmembrane 


321 


- 337 


( 


312 


- 344) 


INTEGRAL 


Likelihood 




-9. 


.24 


Transmembrane 


144 


- 160 


( 


138 


- 167) 


INTEGRAL 


Likelihood 




-9. 


.24 


Transmembrane 


434 


- 450 


( 


431 


- 453) 


INTEGRAL 


Likelihood 




-8. 


.28 


Transmembrane 


99 


- 115 


( 


93 


- 121) 


INTEGRAL 


Likelihood 




-7. 


.38 


Transmembrane 


253 


- 269 


( 


249 


- 282) 


INTEGRAL 


Likelihood 




-6, 


.26 


Transmembrane 


225 


- 241 


( 


218 


- 247) 


INTEGRAL 


Likelihood 




-5. 


.79 


Transmembrane 


346 


- 362 


( 


343 


- 363) 


INTEGRAL 


Likelihood 




-1. 


.12 


Transmembrane 


409 


- 425 


( 


409 


- 426) 


INTEGRAL 


Likelihood 




-0. 


.00 


Transmembrane 


375 


- 391 


( 


375 


- 391) 


PERIPHERAL 


Likelihood 




0. 


.69 


188 













modified ALOM score : 3.15 
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*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 6307 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03909 GB:AP001507 PTS system, galactitol-specif ic enzyme II, 
10 C component [Bacillus halodurans] 

Identities = 92/347 (26%) , Positives = 173/347 (49%) , Gaps = 15/347 (4%) 

Query: 1 MVKTTGLHLPIVDIGWQAGSLTAFSSEIGLSFWFGLLIELGLFLLGITRVEVPSNLWNN 60 
MV G+ L ++D+GW A S A++S + GL++ + + + T+ + ++WN 

15 Sbjct: 70 MVDRLG VDLNVIDVGWPATS S I AWAS WAAF 1 1 PLGL I VNVIML VTKTTKT - MNVD I WNF 128 

Query: 61 FGYMIWGT^YAATGNFILSFAFWFVILYSLVMSEVLADRWSEYYGVKNATINSIHNIE 120 

+ Y +Y+ + I+ V ++L +++ A SE+Y + +1 + I 

Sbjct: 129 WHYTFMAAWYTVSDSIIQALIAAVMFQIVALKVADWTAPMVSEFYELPGVSIATGSTIS 188 

20 

Query: 121 TLIPALILDPLVWLLGVNKVKLHPESLKTKLGIFGEPMTLGFILGVIIGVLGSLRNIiASI 180 

++ + + Q+ +P++++ + GIFGE + +G ILG IG+L 
Sbjct: 189 YAPGIWLVKGIQKIPGIKHWNADPDTIQRRFGIFGESIFIGLILGAAIGLLAGYNV 244 

25 Query: 181 DTWGGILGFAVALAAvMTIFPLITGVFASAFAPIAFAVERNKKKESQAECjGALDKKRWFI 240 

G ++ +A+AAVM + P + + P++E+ K +1 
Sbjct: 245 - - -GEVIEIGMAMAAVMVLMPRMVKILMEGLMPVSESAREWIMKR FGDREIHI 294 

Query: 241 AVDDGVGFGEPATIIAGLILVPIMVVISLILPGNEALPVVDLIAIPFMIEAMIAVSKGNI 300 
30 +D V G P+ I LILVP+ V++++ILPGN LP DL IPF++ ++ ++GNI 

Sbjct: 295 GLDAAVLLGHPSVISTALILVPLTVIiLAVILPGN^IiPFGDLATIPFIVAFIVGAARGNI 354 

Query: 301 LKAILNGIIWFSLGLYAASALGPIYTFAVKHFGTALPAGVTLIMSFN 347 
+ ++L G I +L LY A+ + P++T+ ++ +P G LI S + 
35 Sbjct: 355 IHSVLAGAIMIALSLYMATDIAPVFTKMAENSNFNMPEGSALISSID 401 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1500 

40 A DNA sequence (GBSxl587) was identified in S.agalactiae <SEQ ID 4615> which encodes the amino 
acid sequence <SEQ ID 4616>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 1013 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1501 

A DNA sequence (GBSxl588) was identified in S.agalactiae <SEQ ID 4617> which encodes the amino 
acid sequence <SEQ ID 4618>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1294 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10147> which encodes amino acid sequence <SEQ ID 
1 01 48> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76604 GB:AE000435 L-xylulose kinase, cryptic [Escherichia 
coli K12] 

Identities = 156/496 (31%) , Positives = 261/496 (52%) , Gaps = 18/496 (3%) 



Query: 


16 


YYLSIDyGGTNTKALIFDKLGHQIAVSSFETLKNETQSGHRQVNLVKTWNAITSAIREVI 


75 






Y+L +D GG+ KA ++D+ G+V QG++++W +IR++ 




Sbjct: 


4 


YWLGLDCGGSWLKAGLYDREGREAGVQRLPLCALSPQPGWAERDMAELWQCCMAVIRALL 


63 


Query: 


76 


QISKLSPEQISAVACIGHGKGLYLLDNKLEPLEQGILSTDNRAKDLAQYFESK--LDNIW 


133 






S +S EQI + GKGL+LLD +PL ILS+D RA ++ + ++ + ++ 




Sb j ct : 


64 


THSGVSGEQIVGIGISAQGKGLFLLDKNDKPLGNAILSSDRRAMEIVRRWQEDGIPEKLY 


123 


Query: 


134 


ELTRQHIFPSQSPVILRWLKDYQPETyKSIGAVLSAKDFIRYKLTGKVQQEYGDASGNHW 


193 






LTRQ ++ +LRWLK+++PE Y IG V+ D++R+ LTG E + S ++ 




Sbjct: 


124 


PLTRQTLWTGHPVSLLRWLKEHEPERYAQIGCVMMTHDYLRWCljTGVKGCEESNISESNL 


183 


Query: 


194 


INFQTGTYDPAILDFFGIREIENSLPELIDSADLVPGGISSQAAKETGLVEGTPWGGLF 


253 






N G YDP + D+ GI EI ++LP ++ SA++ G I++Q A TGL GTPWGGLF 




Sb j ct : 


184 


YNMSLGEYDPCLTDWLGIAEINHALPPVVGSAEIC-GEITAQTAALTGLKAGTPWGGLF 


242 


Query: 


254 


DIDACALGSGVLESDTFSVISGTWNINT- -YPSLKPAKQDSGLMTSYFPDRRYLLEASSP 


311 






D+ + AL +G+ + T + + GTW + + L+ + + Y D. +++ +SP 




Sb j ct : 


243 


DWSTALCaGIEDEFTmAVMGTWAVTSGITRGLRDGFjmPYVYGRYVNDGEFIVHEASP 


302 


Query: 


312 


TSAGNLNFMLKMLMHQEIDNAKSSGGSIYDNLEEFLTHTDATHHGLIFFPFLYGSNTSQD 


371 






TS+GNL + G+D+++ LF PFLYGSN + 




Sbjct: 


303 


TSSGNLEWF TAQWGEISFDEINQAVASLPKAGGDLFFLPFLYGSNAGLE 


351 


Query: 


372 


ASACFFGLTTKSTKSQMIRAVYEGIAFAHKQHITDLIKSRGSVPKIIRFSGGATNSPAWM 


431 






++ F+G+ T++ +++A+YEG+ F+H H+ + ++ R + +R +GG +S WM 




Sbjct: 


352 


MTSGFYGMQAI HTRAHLLQAI YEGWFSHMTHIi - NRMRERFTD VHTLRVTGGPAHSDVWM 


410 


Query: 


432 


QMFSDILNFPIETVEGTELGGLGGAILARHALDKI-SLKEAVQDMVRVKAIYKPQLSEVK 


490 






QM +D+ IE + EGGA+AR +EA +D+ P ++ + 




Sbjct: 


411 


QMLADVSGLRIEIjPQVEETGCFGAALAARVGTGVYHNFSEAQRDLRHPVRTLLPDMTAHQ 


470 


Query: 


491 


GYKKKYHAYQKLLETL 506 








Y+KKY YQ L+ L 




Sb j ct : 


471 


LYQKKYQRYQHLIAAL 486 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1502 

A DNA sequence (GBSxl589) was identified in S.agalactiae <SEQ ID 4619> which encodes the amino 
acid sequence <SEQ ID 4620>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
5 >» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



15 



>GP:AAG05648 GB-.AE004652 hypothetical protein [Pseudomonas aeruginosa] 
Identities = 59/235 (25%) , Positives = 104/235 (44%) , Gaps = 9/235 (3%) 

Query: 23 QVQLIKLVKDLGFSRFEIRQELLQDPDRELPALKAEADFYDINLYYSANEDLIK-GGKVN 81 

Q + L+ G R E+R+EL P + AL A + +S+ +L + G++N 

Sbjct: 23 OAS FLPLLAMAGAQRVELREELFAGPP - DTEALTAAI QLQGLECVFSSPLELWREDGQLN 81 

20 Query: 82 PYIiNKGLKEASQLGAPFIKLNVGQTRNLSKEELEPLKEILKSQTIGIKVENNQDPKAATV 141 

PL L+ A GA ++K+++G + +L L L + + VEN+Q P+ + 

Sbjct: 82 PELEPTLRRAEACGAGWLKVSLGLLPE- -QPDLAALGRRLARHGLQLLVENDQTPQGGRI 139 

Query: 142 ENCQYFMTLVKELQIPISFVFDTANWAFINQDLYQAVNNLACDTTYLHCKNFIQVAGKPH 201 
25' E + F L + Q+ ++ FD NW + Q +A L Y+HCK 1+ 

Sbjct: 140 EVLERFFRIAERQQLDLAMTFDIGNWRWQEQAADEAALRLGRYVGYVHCKAVIRNRDGKL 199 

Query: 202 LSKSLFEGEINLTD - LLKSFSNCEYLALEYPTE LEILKRDVQRLIS I SNSQ 251 

++ ++ LL+ F A+EYP + L + +R+L + Q 

30 ■', Sbjct: 200 VAVPPSAADLQYWQRLLQHFPEGWUJAIEYPLQGDDLLSLSRRHIAAIiARLGQPQ 254 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1503 

A DNA sequence (GBSxl590) was identified in S.agalactiae <SEQ ID 4621> which encodes the amino 
acid sequence <SEQ ID 4622>. Analysis of this protein sequence reveals the following: 



40 



45 



Possible site: 30 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0430 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03939 GB.-AP001507 unknown conserved protein [Bacillus halodurans] 
■ Identities = 136/511 (26%) , Positives = 234/511 (45%) , Gaps = 29/511 (5%) 

50 Query: 4 IjDKKSYDLLFYLIiKLEEPETvJIAIANALNQSRRKVYYHLEKINDALPSDVPQIVSYPRV- 62 

LD++S +L LL + + LN SRR VY LEKIN L + V R 

Sbjct: 3 LDQRSTFILTQLLHARSYLPIQELTQKtNVSRRTVYNDLEKINSWLEEQGLKAVYKVRSQ 62 

Query: 63 GILLTEKQKAACRLLLDEVTDYSYVMKSSERLQLSLVSIWAKDRVTIDRLMQLNDVSRN 122 
55 G++L E+ K L + + Y + ER ++ ++ + + ++ LM VSRN 

Sbjct: 63 GLILDERAKEEIPTKLRSLKSWHYEYSAQERKAWWIYLLTRLEPLFLEHLMDRTGVSRN 122 
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Query: 123 TILNDI^LRSEIAEKEYNLQLQSTKCRGYFLDGHPL SIIQYLYKLLDDIYHNGSS 178 

T ++D+ L+ EL ++L L4 + GY + G +++ YL + L 

Sbjct: 123 TTIDDIKCLKDEL--1^FHLALEFERKDGYTISGDETDKRKALVYYLSQALPQQNWETEL 180 

5 Query: 179 SFIDLFNHKLSQAFGASTYFSKEVLDYFHHYLFISQRSLGKKINSQDGQFMIQILPPILM 238 

S I +F L F+ E L + + S++ L KI D L F+L 

Sbjct: 181 SPIRIF---LRTKRDNGRIFTIEELQKVYDVISESEKVL--KIQYTDDVLHSLSLRFLLF 235 

Query: 239 AYRK MRLSPEVQTSLNSDFSLVWQRKEYEIAKELADELEENFQLSLDEIEVGLVA 293 

10 R +++ P + L KEYE AK ++ +LE+ F + + EV + 

Sbjct: 236 MKRVAKGKFI KVHPLEKQVLKGT KEYEAAKVMSFKLEQAFGVHYPDEEVLYLT 288 

Query: 294 MLMLSFRKDRDN-HLESQ-DYDDMRATLTSFLKEIiEERYHLHFVHKKDLLRQLLTHCKAL 351 
+LS + + N +ES+ + ++ +TS +++++FK+L+LHK 
15 Sbjct: 289 THILSSKINYANGEIESRKESQELTHIVTSMVNDFQICfACVVFEEKELLEKNLFFHIKPA 348 

Query: 352 LYRIOIYGIFSVHPLTEHIKDKYEELFAITSSSVKLLEKAWQIKLTDDDVAYLTIHLGGEL 411 

YR +YG+ N + E IK Y ELF +T V LE+ + D++VA++T+H G + 

Sbjct: 349 FYRIKYGLEVE1OTIAESIKTSYPELFLLTRKVVHYLERYVGKSVNDNEVAFITMHFVGWM 408 

20 

Query: 412 RNSQQSPNK-LKLVIVSDEGIAIQKLLLKQCQRYLTNSDIEAVFTTEQYQSVSDLMHVDM 470 

R P K K +IV G+ + L Q + DI + +Y+ + VD 

Sbjct: 409 RREGTIPTKRKKALIVCANGVGTSQFLKNQLEGLFPAVDIIKTCSIREYEKTP--VEVDF 466 

25 Query: 471 WSTSDALESRFPMLWHPVLTDDDIIRLIR 501 

++ST+ E P+ +V+P+LT+ + RL++ 
Sbjct: 467 IISTTSIPEKNVPIFIVNPILTETEKERLLK 497 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4623> which encodes the amino acid 
30 sequence <SEQ ID 4624>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 0745 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 368/548 (67%) , Positives = 456/548 (83%) 

Query: 1 MIILDKKSYDLLFYLLKLEEPETVmiANALNQSRRKVYYHLEKINDALPSDVPQIVSYP 60 

M+ILDKKSYDLL YLLKLE PETVMAI ++ALNQSRRKVYY L+KIN ALP V QI+SYP 
Sbjct: 1 MMILDKKSYDLLSYLLKLETPETVMAISHALNQSRRKVTYQLDKINQALPKGVDQIISYP 60 

45 

Query: 61 RVGILLTEKQKAACRLLLDEVTDYSYVMKSSERLQLSLVSIWAKDRVTIDRLMQLNDVS 120 

R+GILLT QKAACRLLL+EVTDY+YVMKS ER +LS + I V+ +RVTID+LMQ+NDVS 
Sbjct: 61 RLGILLTADQKAACRLLLEEVTDYNYVMKSDERRRLSSIYIAVSTERVTIDKLMQINDVS 120 

50 Query: 121 RNTILNDLNELRSELAEKEYNLQLQSTKCRGYFLDGHPLSIIQYLYKLLDDIYHNGSSSF 180 

RNTILNDL ELR EL +K+Y +QL +TK RGY+ HP+ + + 1 QYLYKLL D+Y G++SF 
Sbjct: 121 RNTIIiNDLTELREELEDKQYKIQLHATKARGYYFGCHPMALIQYLYKLLVDVYQGGNTSF 180 

Query: 181 IDLFNHKLSQAFGASTYFSKEVLDYFHHYLFISQRSLGKKINSQDGQFMIQILPFILMAY 240 
55 ID+FN KLS+ G S YFSK++L YFH YLF+SQ SLGK IN+QD QFM+QILPF+L++Y 

Sbjct: 181 IDIFNRKLSEIQGLSWFSKDILTYFHEYLFLSQASLGKTINTQDSQFMLQILPFMLLSY 240 

Query: 241 RKMRLSPEVQTSLNSDFSLVWQRKEYEIAKELADELEENFQLSLDEIEVGLVAMLMLSFR 300 
R MRL E +++L +F L+W+RKEY IA++LA EL NF+L LD+IEV +VAMLMLSFR 
60 Sbjct: 241 RNMRLDSETKSALKQEFHLIWKRKEYHIAQDIjARELYHNFKLHLDDIEVSMVAMLMLSFR 300 

Query: 301 KDRDNHLESQDYDDMRATLTSFLKELEERYHLHFVHKKDLLRQLLTHCKALLYRKRYGIF 360 

KD+D+H+ESQDYDDMRAT++ F+ +LE RY LHF HK+DLL++L THCKAL+YRK YGIF 
Sbjct: 301 KDQDHHVESQDYDDMRATISHFIDQLESRYQLHFTHKQDLLKRLTTHCKALVYRKAYGIF 360 

65 
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Query: 


361 


SVNPLTEHIKDKYEELFAITSSSVKLLEKAWQIKLTDDDVAyLTIHLGGELRNSQQSPNK 


420 














Sb j ct : 


361 


LTOPLTDHVKEKYEELFMTQSC^TILEQDWTISLTDDDIAYLTIHLGGELRHKOTEQEK 


420 


5 


Query: 


421 


LKLVIVSDEGIAIQKLLLKQCQRYLTO 


480 














Sbjct: 


421 


TKLVIVSDDGIGIQKLLFKQCQRYLANGQIEAWTTEQYQSVYDLLAVDMIVATTDTLKT 


480 


10 


Query: 


481 


RFPMLVVHPVLTDDDIIRLIRFSKKGNCaNSNQFTNELEKTIAQYVKEDSERYVLKSKIE 


540 






+ PML+V+P+L+DDDII+LIRFSK+G + ++F+ EL K I VK++S+RY h SKIE 






Sbjct: 


481 


KIPMLIVNPILSDDDIIKLIRFSKQGRLSEHSRFSTELTKAIEAWKDESDRYALVSKIE 


540 




Query: 


541 


KLIHQELL 548 










KLIH+ELL 




15 


Sbjct: 


541 


KLIHRELL 548 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1504 

20 A DNA sequence (GBSxl591) was identified in S.agalactiae <SEQ ID 4625> which encodes the amino 
acid sequence <SEQ ID 4626>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 2692 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77149 GB-.AE000491 orf , hypothetical protein [Escherichia coli K12] 
Identities = 211/363 (58%) , Positives = 270/363 (74%) , Gaps = 9/363 (2%) 

MPNVIOITRESWILSTFPEWGTWIjNEEIEEEWAEGNFAMWWLGNCGWIKTPGGANVVM 6 0 
35 M VK ITRESWILSTFPEWG+WLNEEIE+E VA G FAMWWLG G+W+K+ GG NV + 

MSKVKSITRESWILSTFPEWGSWLNEEIEQEQVAPGTFAMWWLGCTGIWLKSEGGTNVCV 62 



40 



45 



50 



60 



Query: 


1 


Sbjct: 


3 


Query: 


61 


Sb j ct : 


63 


Query: 


121 


Sb j ct : 


123 


Query: 


181 


Sbjct: 


182 


Query: 


241 


Sb j Ct : 


234 


Query: 


301 


Sb j ct : 


294 


Query: 


361 


Sbjct: 


354 



D W GK + M +GHQM MAGV+ KLQPNLR P V+DPFAI ++D L +H H+ 



DHID+N AAA++ N D V F+GP C ++W WGVP+ER IV+KPG+ + KDI++ A 



+++FDRT L+TLP D + AG V + M +AVNY+F+TPGG++YH DSH+SN 

LDAFDRTALITLPADQ KfiAG- -VLPDGMDDRAVNYLFKTPGGSLYHSGDSHYSN 233 



Y+AKHG +++IDVA+ +YG+NP GI DKMTS D+LRM E L AKV+IP H+DIWSNF A 



55 EI LW+M+K+RL+Y F PFIW+VGGK+T+P DKD EYH+PRGFDDCF E ++ FK 



+ L 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4627> which encodes the amino acid 
sequence <SEQ ID 4628>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3298 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 315/363 (86%) , Positives = 348/363 (95%) 



Query: 


1 


MPNVKDITRESWILSTFPEWGTWLiNEEIEEEWAEGNFAMWWLGNCGWIKTPGGANVVM 


60 






M V+DITRESWIL+TFPEWGTWLNEEIE+EW NFAMWWLGNCG+WI KTPGGANWM 




Sbjct: 


1 


MTKVQDITRESWIIjNTFPEWGTWLNEEIEQEWPADNFAMWWLGNCGIWIKTPGGANVVM 


60 


Query : 


61 


DLWSNRGKSTKKVKDMWGHQMANMAGWKLQPNLRAQPMVIDPFAINELDYYLVSHFHS 


120 






DLWSNRGK+TK+VKDMVRGHQMANMAG RKLQPNLRAQPMVIDPF INELDYYLVSH+HS 




Sb j ct : 


61 


DLWSNRGKATKQVKDMWGHQMANMAGARKLQPNLRAQPMVIDPFMINELDYYLVSHYHS 


120 


Query: 


121 


DHIDIOTAAAIINNPNLDHVKFVGPYECGEIWKKWGVPEERIIVIKPGESFEFKDIKVTA 


180 






DHIDINTAAAI INNP L+HVKFVGPYECGE+WK WGVP++RI+++KPG+SFEFKDIK+TA 




Sb j ct : 


121 


DHIDINTAAAI INNPKLNHVKFVGPYECGEVWKNWGVPKDRIMILKPGDSFEFKDIKITA 


180 


Query: 


181 


VESFDRTCLVTLPVDGAEEHDGELAGLAVTDEEMARKAVNYIFETPGGTIYHGADSHFSN 


240 






VESFDRTCLVTLP+ GA+ DG+LAGLA+TD++MARKAVNYIFETPGGTIYHGADSHFSN 




Sb j ct : 


181 VJ3SFDRTCLVTLPIQ£MJ^rX5DIAGIAIT^ 


240 


Query: 


241 


YFAKHGKDYKIDVAINNYGDNPVGIQDKMTSIDLLRMaENLRAKVI I PVHYDIWSNFMAS 


300 






YFAKHG+DY IDV +NOTG+NP+GIQTJKWTCS+DLIiRMRENLRAKV+IPVHYDIWSNFMAS 




Sb j ct : 


241 


YFAKHGRDYDIDVVLNHYGENPIGIQDKMTSVDLLRMAENLRAKWIPVHYD 


300 


Query: 


301 TDEILQLWKMRKERLQYDFHPFIWEVGGKYTYPQDKDRIEYHHPRGFDDCFEQESNIQFK 


360 






TDEIL+LWKMRKERLQYDFHPFIWEVGGKYTYPQD++RIEYHHPRGFDDCF ++SNIQFK 




Sb j ct : 


301 TDEILELWKMRKERLQYDFHPFIWEVGGKYTYPQDQNRIEYHHPRGFDDCFLEDSNIQFK 


360 


Query: 


361 


ALL 363 








ALL 




Sb j ct : 


361 ALL 363 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1505 

A DNA sequence (GBSxl592) was identified in S.agalactiae <SEQ ID 4629> which encodes the amino 
acid sequence <SEQ ID 463 0>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 98 8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10145> which encodes amino acid sequence <SEQ ID 
10146> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAA18808 GB:D90917 hypothetical protein [Synechocystis sp.] 
Identities = 358/785 (45%) , Positives = 494/785 (62%) , Gaps = 15/785 (1%) 

Query: 22 IjEKLDAWWRAAOTISAAQMYLKDNPLLRREL^ 81 
5 L ++ +WRAANY++ +YL+DNPLLR L +K +GHWG+ PG +F+Y HLNR 

Sbjct: 44 IMQMHGFWRAANyiAVGMIYLRDNPLLREPLQPEQIKHRLLGHWGSSPGISFLYTHLiNRI 103 

Query: 82 INKYDLDMFYIEGPGHGGQVMVSNSYLDGSYTELNPNIEQTEDGFKQLCKIFSFPCGIAS 141 
I K+D DM Y+ GPGHG + YL+GSY+ + EDG K+ K FSFP GI S 

10 Sbjct: 104 IRKFDQDMLYMVGPGHGAPGFLGPCYLEGSYSRFFAECSEDEDGMKRFFKQFSFPGGIGS 163 

Query: 142 HAAPETPGSIHEGGELGYALSHATGAILDNPDVIAATVIGDGEGETGPLMAGWLSNTFIN 201 

H PETPGSIHEGGELGY LSHA GA DNP++I + GDGE ETGPL W SN FIN 
Sbjct: 164 HCTPETPGSIHEGGELGYCLSHAYGAAFDNPNLIWGIAGDGESETGPIATSWHSNKFIN 223 

15 

Query: 202 PVNDGAVLPIFYLNGGKIHNPTIFERKTDEELSQFFEGLGWKPIFADWELSEDHAAAHA 261 

P+ DGAVLP+ +LNG KI+NP++ R + EEL FEG G+ P F + D + H 

Sbjct: 224 PIRDGAvLPVLHLNGYKINNPSVLSRISHEELKALFEGYGYTPYFVE GSDPESMHQ 279 

20 Query: 262 LFAEKLDQAIQEIICriQSEARQKPAEEAIQAKFPVLVARIPKGWTGPKAWEGTPIEGGFR 321 

A LD + EI IQ EAR A++ ++P++V R PKGWTGP +G +EG +R 

Sbjct: 280 AMAATLDHCVSEIHQIQQEARSTGI - -AvRPRWPMVVMRTPKGWTGPDYVDGHKVEGFWR 337 

Query: 322 AHQVP I PVDAHHMEHVDSLLSWLQS YRPEELFDENGKIVDE I AAI SPKGDRRMSMNP I TN 381 
25 +HQVP+ + H+ L +W4 + SY+ PEELFDE G + AI+P+GD+R+ P N 

Sbjct: 338 SHQVPMGGMHENPAHLQQLEAWMRSYKPEELFDEQGTLKPGFKAIAPEGDKRLGSTPYAN 397 

Query: 382 AGIV-KAMDTADWKKFALDINVPGQIMAQDMIEFGKYAADLVDANPDNFRIFGPDETKSN 440 
G++ + + D++++ +D++ PG I A + G + D++ N NFR+FGPDE SN 
30 Sbjct: 398 GGLLRRGLKMPDFRQYGIDVDQPGTIEAPNTAPLGVFLRDVMANNMTNFRLFGPDENSSN 457 

Query: 441 RLQEVFTRTSRQWLGRRKPDYDEA--LSPAGRVIDSQLSEHQAEGFLEGYVLTGRHGFFA 498 

+ L v+ + .'■+ W+ + + LSP GRV++ LSEH EG+LE Y+LTGRHGFFA 

Sbjct: 458 KLHAvYEVSKKFWIAEYLEEDQDGGELSPDGRVME-MLSEHTLEGWLEAYLLTGRHGFFA 516 

35. 

Query: 499 SYESFLRVVDS^TO , QHFKWLRKSKTHTTWRKNYPALNLIAASTVFQQDHNGYTHQDPGIL 558 

+YESF V+ SMV QH KWL + H WR + +LN++ STV+ +QDHNG+THQDPG L 
Sbjct: 517 TYES FAHVI TSMVNQHAKWLDI CR - HLNWRAD I S S LNILMTSTVWRQDHNGFTHQDPGFL 575 

40 Query: 559 THLAEKTPEYIREYLPADTNSLLAVMDKAFKAEDKINLIVTSKHPRPQFYSIAEAEELVA 618 

+ K+P+ +R YLP D NSLL+V D ++++ IN+IV K Q+ + A 
Sbjct: 576 DVILNKSPDWRIYLPPDvNSLLSVADHCLQSKNYINIIVCDKQAHLQYQDMTSAIRNCT 635 

Query: 619 EGYKVIDWASOTSLNQEPDWFAAAGTEPNLFJU^AAISILHKAFPELKIRFVNVLDILKL 678 
45 +G + +WASN EPDW AAAG P EALAA ++L + FP L+IRFV+V+D+LKL 

Sbjct: 636 KGVDIVffiWASN-DAGTEPDVVMAAAGDIPTKEALAATAMLRQFFPNLRIRFVSVIDLIiKL 694 

Query: 679 RHPSQDARGLSDEEFNKVFTTDKPVIFAFHGYEDMIRDIFFSRHNH-NLHTHGYRENGDI 737 
+ S+ GLSD +F+ +FTTDKP+IF FH Y +1 + + R NH NLH GY+E G+I 
50 Sbjct: 695 QPESEHPHGLSDRDFDSLFTTDKPIIFNFHAYPWLIHRLTYRRTNHGNLHVRGYKEKGNI 754 

Query: 738 TTPFDMRVMSELDRFHLAQDA- -ALASLGNKAQAFSDEMNQMVAYHKDYIREHGDDIPEV 795 

TP D+ + +++DRF LAD LL ++M +Y EHG D+PE+ 

Sbjct: 755 NTPMDLAIQNQIDRFSLAIDVIDRLPQLRVAGAHIKEMLKDMQIDCTNYAYEHGIDMPEI 814 



55 



Query: 796 QNWKW 800 
NW+W 

Sbjct: 815 VNWRW 819 



60 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Examplel506 

A DNA sequence (GBSxl593) was identified in S.agalactiae <SEQ ID 463 1> which encodes the amino 
acid sequence <SEQ ID 4632>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3509 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



15 



>GP:AAF37878 GB:AF234619 OpuAA [Lactococcus lactis] 
Identities = 274/402 (68%) , Positives = 338/402 (83%) 

Query: 5 LEVKNLTKIFGKKQKAALE^WKQGKSKTEILEKTGATVGVYDASFEIKEGEIFVIMGLSG 64 

+++++LTKIFGK+ K AI> MV++G+ K EIL+KTGATVGVYD +FEI EGEIFVIMGLSG 
Sbjct: 5 IKIEHLTKIFGKRIKTALTMVEKGEPKNEILKKTGATVGVYDTNFEINEGEIFVIMGLSG 64 

20 Query: 65 SGKSTLVl^LNRLIDPSSGNIYLDGKDIAKMNVEDLRNIRRHDINMVFQNFGLFPHRTIL 124 

SGKSTL+R+LNRLI+P+SG I++D +D+A +N EDL +RR ++MVFQNFGLFPHRTIL 
Sbjct: 65 SGKSTLLRLLNRLIEPTSGKIFIDNQDVATLNKEDLLQVRRKTMSMVFQNFGLFPHRTIL 124 

Query: 125 ENTEFGLEMRGVSKEERTTIJffiKALDNAGIiPPKDQYPSQLSGGMQQRVGLARALANSPK 184 
25 ENTE+GLE++ V KEER AEKALDNA LL FKDQYP QLSGGMQQRVGLARALAN P+ 

Sbjct: 125 ENTEYGLEVQNVPKEERRKRAEKALDNANLLDFKDQYPKQLSGGMQQRVGLARALANDPE 184 

Query: 185 ILL^EAFSALDPLIRREMQDELLDLQOTNKQTIIFISHDINEALRIGDRIALMKDGEIM 244 
ILLMDEAFSALDPLIRREMQDELL+LQ ++TIIF+SHDIiNEALRIGDRIA+MKDG+IM 
30 Sbjct: 185 ILLMDEAFSALDPLIRREMQDELLELQAKFQKTIIFVSHDLNEALRIGDRIAIMKDGKIM 244 

Query: 245 QIGTGEEILTNPANDFTOEFVEDVDRSKVLTAQNIMIKPLTTVLEIDGPQVALTRMHREE 304 

QIGTGEEILTNPAND+V+ FVEDVDR+ KV+TA+NIMI LTT +++DGP VAL +M EE 
Sbjct: 245 QIGTGEEILTNPANDYVKTFVEDVDRAKVITAENIMIPALTTNIDVDGPSVALKKMKTEE 304 

35 

Query: 305 VSMLMATNRFJlQLLGSLTADAAIEARKKDLPLSEVIDKDVVrVSKDTVITDIMPLIYDSS 364 

VS LMA +++RQ G +T++ AI ARK + PL +V+ DV TVSK+ ++ DI+P+IYD+ 
Sbjct: 305 VSSLMAVDKKRQFRGVVTSEQAIAftRKNNQPLKDvMTTDVGTVSKEMLVRDILPIIYDAP 364 

40 ; Query: 365 APIAVTDDNDRLLGVI IRGRVIEALANVQDETWESPKETVE 406 

P+AV DDN L GV+IRG V+EALA++ DE VE ++ E 
Sbjct: 365 TPLAWDDNGFLKGVLIRGSVLEALADIPDEDEVEEIEKEEE 406 

A related DNA sequence was identified in S.pyogenes <SEQ ID 463 3> which encodes the amino acid 
45 sequence <SEQ ID 4634>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .3761 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 344/395 (87%) , Positives = 374/395 (94%) 

Query: 1 MINI LE VKNLTKI FGKKQKAALEMVKQGKSKTE ILEKTGATVGVYDAS FE I KEGE I FVI M 60 

M ILEVK+L+KIFGKKQKAALEMVK GK+K+EI +KTGATVGVYDASFE+K+GEIFVIM 
Sbjct: 1 METILEV^a^LSKIFGKKQKAALE^IvKTGKNKSEIFKKTGATOGWDASFEvKKGEIFVIM 60 



60 



Query: 61 GLSGSGKSTLVRMLNRLIDPSSGNIYLDGKDIAKMNVEDLRNIRRHDINMVFQNFGLFPH 120 
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GLSGSGKSTLVRMLNRLI +PS+G+ I L+GKDI+ M+ + LR +RRHDINMVFQ+F LFPH 



Sbjct: 61 GLSGSGKSTLVRMIjNRLIEPSAGSILLEGKDISTMSADQLREVRRHDINMVFQSFMjFPH 120 

Query: 121 RTILEOTEFGLEMRGVSKEERTTLAEKAIiDNAGLLPFKDQYPSQLSGGMQQRVGLARALA 180 

+TILENTEFGLE+RGV KEER LAEKALDN+GLL FKDQYP+QLSGGMQQRVGLARALA 
Sbjct: 121 KTILENTEFGLELRGVPKEERQRLAEKALDNSGLLDFKDQyPNQLSGGMQQRVGLARALA 180 

Query: 181 NSPKILLMDEAFSALDPLIRREMQDELLDLQDTNKQTIIFISHDLNEALRIGDRIALMKD 240 

NSPKILLMDEAFSALDPLIRREMQDELLDLQD+ KQTIIFISHDLNEALRIGDRIALMKD 
Sbjct: 181 NSPKILLMDEAFSRLDPLIRREMQDELLDLQDSMKQTIIFISHD1MEALRIGDRIALMKD 240 

Query: 241 GEIMQIGTGEEILTOPANDFVREFVEDVDRSKVLTAQN1MIKPLTTVLEIDGPQVALTRM 300 

G+IMQIGTGEEILTNPANDFVREFVEDVDRSKVLTAQNIMIKPLTT +E+DGPQVAL RM 
Sbjct: 241 GQIMQIGTGEEILTOPANDFTOEFVEDVDRSKVLTAQNIMIKPLTTTVELDGPQVALNRM 300 

Query: 301 HREEVSMLMATNRRRQLLGSLTADAAI EARKKDLPLSEVIDKD WTVSKDTVI TD IMPL I 360 

H EEVSMLMATNRRRQL+GSLTADAAIEARKK LPLSEVID+DV TVSKDT+ITDI+PLI 
Sbjct: 301 HNEEVSMLMATNRRRQLVGSLTADAAIEARKKGLPLSEVIDRDVRTVSKDTIITDILPLI 360 

Query: 361 YDSSAPIAVTDDNDRLLGVIIRGRVIEALANVQDE 395 

YDSSAPIAVTDDN+RLLGVI IRGRVIEALAN+ DE 
Sbjct: 361 YDSSAPIAVTDDNNRLLGVI IRGRVIEALANISDE 395 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1507 

A DNA sequence (GBSxl594) was identified in S.agalactiae <SEQ ID 4635> which encodes the amino 
acid sequence <SEQ ID 4636>. This protein is predicted to be OpuABC (opuAB). Analysis of this protein 
sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N- terminal signal sequence 
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>GP:AAF37879 GB:AF234619 OpuABC [Lactococcus lactis] 
Identities = 345/578 (59%) , Positives = 429/578 (73%) , Gaps = 8/578 (1%) 

Query: 1 MENLLQHKLPVAPFVESTTNWITKTFSGLFDFIQTIGNALMDWMTKTLLFINPLLFIVLI 60 

M +L ++P+A +V S T+WIT TFS FD IQ G LM+ +T L + L I ++ 
Sbjct: 1 MIDLAIGQVPIANWVSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWLMIAW 60 

Query: 61 TIAVFFLAKKKWQLPTFTFIGLLFIYNQGLWEQLINTFNLVLVASLISIIIGVPLGIWMA 120 

TI ++ KK P FTFIGL I NQGLW L++T LVL++SL+SIIIGVPLGIWMA 
Sbjct: 61 TILAILVSGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSIIIGVPLGIWMA 120 

Query: 121 KSDKVKQVVNPILDFMQTMPAFVYLIPAVAFFGIGWPGVFASWFALPPTTOFTNLAIR 180 

KSD V ++V PILDFMQTMP FVYLIPAVAFFGIG+VPGVFASV+FALPPTVR TNL IR 
Sbjct: 121 KSDLVAKIVQPILDFMQTMPGFVYLIPAVAFFGIGWPGVFASVIFALPPTVRMTNLGIR 180 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 5267 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty= 0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 181 EIPLELIEASDSFGSTVKQKLFKVELPLAKNTIMAGINQT^LALSMVVTGSMIGAPGLG 240 
++ EL+EA+DSFGST +QKLFK+E PLAK TIMAG+NQT+MLALSMW SMIGAPGLG 



WO 02/34771 



PCT/GB01/04789 



10 



-1665- 

Sbjct: 181 QVSTELVEAADSFGSTARQKLFKLEFPLAKGTIMAGVNQTIMIALSMWIASMIGAPGLG 240 

Query: 241 REVLSALQHADIGTGFVSGLSLVILAIVLDRVSQFFNSKPGEKQAKTSKVKKW---VGLG 297 

R VL+A+Q ADIG GFVSG+ SLVI LAI ++DR +Q N P EKQ + VKKW + L 
Sbjct: 241 RGWIiAAVQSADIGKGFVSGISLVIIAIIIDRFTQKIJitVSPLEKQGNPT-VKKWKRGIALV 299 

Query: 298 AIALFILAALGRIVVNMTSGNEAKGQKVKIAYVQWDSEVASTNVIAEVLKSKGYDVE 357 

+L 1+ A M+ G A +KV + Y+ WDSEVAS NV+ + +K G+DV+ T 

Sbjct: 300 SLLALI IGAFS GMSFGKTASDKKVDLVYMNWDSEVASINVLTQAMKEHGFDVKTTA 355 

Query: 358 LDNAV^QTVANGNADFTTSAWLPKTHGQYFNKYKNSLDDLGPHVEl^IGLWPKYMNV 417 

LDNAV WQTVANG AD SAWLP TH + KY S+D LGP+++ K+G WP YMNV 
Sbjct: 356 LDNAVAWQTOANGQADG^SAWLPOTHKTQWQKYGKSVDLLGPNLKGAKVGFWPSYMW 415 

15 Query: 418 NSIEELSNQADKQITGIEPGAGIMKSAKQSLKDYPNLSSWKLLSASTGAMTTTLGKAIKN 477 

NSIE+L+NQA+K ITGIEPGAG+M +++++L Y NL WKL+ +S+GAMT LG+AIK 
Sbjct: 416 NSIEDLTNQANKTITGIEPGAGVMAASEKTI^SYDNLKDWKLVPSSSGAMTVALGEAIKQ 475 

Query: 478 KDQWITGWSPHWMFAKYDLKYLKDPKKSFGGEEHIOTIARKNLKKDMPKVYKIIDKFKW 537 
20 +VITGWSPHWMF KYDLKYL DPK + G E+INTI RK LKK+ P+ YK++DKF W 

Sbjct: 476 HKDIVITGWSPHWMFNKYDLKYIADPKGTMGTSENIOTIWKGLKKENPEAYKVLDKFi™ 535 

Query: 538 TKEDMESIMLDMDKGMEPAKAAQKWIKNHKKEVSEWTK 575 
T +DME++MLD+ G P +AA+ WIK+H+KEV +W K 
25 Sbjct: 536 TTKDMEAVMLD I QNGKTPEEAAKNWI KDHQKE VDKWFK 573 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4637> which encodes the amino acid 
sequence <SEQ ID 4638>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
30 >» Seems to have no N-terminal signal sequence 



35 



, Final Results 

bacterial membrane Certainty=0 . 4545 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF37879 GB:AF234619 OpuABC [Lactococcus lactis] 
45 , Identities = 340/571 (59%) , Positives = 418/571 (72%) , Gaps = 8/571 (1%) 

Query: 8 KLPVAQLVEQLTEWLTKTFSGLFDIMQWGSFLMDWMTKTLLFIHPLLFIVLVTAGMFFL 67 

++P+A V T+W+T TFS FD++Q G+ LM+ +T L + LI +VT + 
Sbjct: 8 QVPIANWSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWLMIAVVTILAILV 67 
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Query: 68 AKKKWPLPTFTLLGLLFIYNQGLWKQLMNTFTLVLVASLI SVLIGI PLGIWMAKNATVRQ 127 

+ KK P FT +GL I NQGLW LM+T TLVL++SL+S++IG+PLGIWMAK+ V + 
Sbjct: 68 SGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSI I IGVPLGIWMAKSDLVAK 127 



55 Query: 128 IVNPILDFMQTMPAFVYLIPAVAFFGIGMVPGVFASVIFALPPTVRFTNLAIRDIPTELI 187 

IV PILDFMQTMP FVYLIPAVAFFGIG+VPGVFASVIFALPPTVR TNL IR + TEL+ 
Sbjct: 128 IVQPILDFMQTMPGFVYLIPAVAFFGIGVVPGOTASvTFALPPTVRMTNLGIRQVSTELV 187 

Query: 188 FASDAFGSTGKQKLFKVELPLAKNTI^^GvNQTMMljALSMVVTGSMIGAPGLGREVLSAL 247 
60 EA+D+FGST +QKLFK+E PLAK TIMAGVNQT+MLALSMW SMIGAPGLGR VL+A+ 

Sbjct: 188 EAADSFGSTARQKLFKLEFPLAKGTI^GWQTIMIALSMWIASMIGAPGLGRGVLAAV 247 

Query: 248 QHADIGSGFVSGLALVILAIVLDRMTQLFNSKPQEKAKAGKTNJCW IGLAALAVFLIA 304 

Q ADIG GFVSG++LVILAI++DR TQ N P EK KW I L +L +1 

65 Sbjct: 248 QSADIGKGFVSGISLVILAIIIDRFTQKLNVSPLEKQGNPTVKKWKRGIALVSLLALIIG 307 
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Query: 305 ;^GRGIMAMTSG^W^KGETVNIAWQWDSEVASTHVIffi 364 

A M+ G + V++ Y+ WDSEVAS +V+ + +K G+ V T LDNAV WQ 

Sbjct: 308 AFS GMSFGKTASDKK^LVYI^DSEWASIimiTQAMKEHGFDVKTTALDNAVAWQ 363 

5 

Query: 365 WANGNADFSTSAWLPVTHGQQYQiaKSKIBDLGPl^KGTKLGLAVPKYMTDVNSIEDLS 424 

TVANG AD SAWLP TH Q+QKY +D LGPNLKG K+G VP YM +VNSIEDL+ 
Sbjct: 364 WANGQADG^^VSAWLPm , HKTQWQKYGKSVDLLGPNLKGAKVGFVVPSYM-NVNSIEDLT 422 

10 Query: 425 KQADQKITGIEPGAGIMAAAQKTLKEYHNLSSWELVAASTGAMTTSLDQAIKKKDPIWT 484 

QA++ ITGIEPGAG+MAA++KTL Y NL W+LV +S+GAMT +L +AIK+ IV+T 
Sbjct: 423 NQANKTITGIEPGAGVMAASEKTLNSYDNLKDWKLVPSSSGAMTVALGEAIKQHKDIVIT 482 

Query: 485 AWSPHV#!FAKYDLKYLKDPKEIFGSTENINTIARKGLKKELPNVYKIIDKFHWTQKDMEA 544 
15 WSPHWMF KYDLKYL DPK G++ENINTI RKGLKKE P YK++DKF+WT KDMEA 

Sbjct: 483 GWSPHVMFNKYDLKYLADPKGTMGTSFJIIOTITOKGLKKENPEAYKVLDKFNWTTKDMEA 542 

Query: 545- VMLDINKGMSPEAAAKKWVEANKSKVSSWTK 575 
VMLDI G +PE AAK W++ ++ +V W K 
20 Sbjct: 543 VMLD I QNGKTPEEAAKNWI KDHQKEVDKWFK 573 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 439/576 (76%) , Positives = 513/576 (88%) , Gaps = 2/576 (0%) 

25 Query: 1 MENLLQHKLPVAPFVESTTNWITKTFSGLFDFIQTIGNALMDVJMTKTLLFINPLLFIVLI 60 

+E +LQ KLPVA VE T W+TKTFSGLFD +Q +G+ LMDWMTKTLLFI+PLLFIVL+ 
Sbjct: 1 LETILQTKLPVAQLVEQLTEWLTKTFSGLFDIMQWGSFLMDWMTKTLLFIHPLLFIVLV 60 

Query: 61 TIAVFFLAKKKWQLPTFTFIGLLFIYNQGLWEQLIOTFNLVLVASLI S 1 1 IGVPLGIWMA 120 
30 T +FFLAKKKW LPTFT +GLLFIYNQGLW+QL+NTF LVLVASLIS++IG+PLGIWMA 

Sbjct: 61 TAGMFFIJUCKKWPLPTFTIjLGLLFIYNQGLWKQLMNTFTLVLVASIiISVLIGIPLGIVJMA 120 

Query: 121 KSDKVKQVVNPILDFMQTMPAFVYLIPAVAFFGIGMVPGWASWFALPPTVRFTNLAIR 180 
K+ V+Q+VNP I LDFMQTMPAFVYL I PAVAF FG I GMVPGVFAS V+ FALPPTVRFTNLAIR 
35 Sbjct: 121 KNATVRQIVNPILDFMQTMPAFVYLIPAVAFFGIGMVPGVFASVI FALPPTVRFTNLAIR 180 

Query: 181 EIPLELIEASDSFGSTWQKLFKVELPLAKNTIMAGINQTMMLALSMVVTGSMIGAPGLG 240 

+ IP ELIEASD+FGST KQKLFKVELPIAJCNTimG+NQTMMLALSMVVTGSMIGAPGLG 
Sbjct: 181 DIPTELIFASDAFGSTGKQKLFiOTELPLAKNTIMAGVNQTmLALSMVVTGSMIGAPGLG 240 

40 

Query: 241 REVLSALQHADIGTGBVSGLSLVIIAIVLDRVSQFFNSKPGEKQAKTSKVKKWVGLGALA 300 

REVLSALQHADIG+GFVSGL+LVILAIVLDR++Q FNSKP EK AK K KW+GL ALA 
Sbjct: 241 REVLSALQHADIGSGWSGLALVILAlVLDRMTQLFNSKPQEK-AKAGKraKMIGIjAALA 299 

45 Query: '301 LFIIAALGRIVVNMTSGNEAKGQKVKIAYVQWDSEVASTNVIAEVLKSKGYDVELTPLDN 360 

+F++AALGR ++ MTSG KG+ V IAYVQWDSEVAST+VIAEVLK++GY V LTPLDN 
Sbjct: 300 VFLIAALGRGIMAMTSGMADKGETVNIAWQWDSEVASTHVIAEVLKNEGYHVTLTPLDN 359 

Query: 361 AVMWQWANGNADFTTSAWLPKTHGQYFNKYKNSLDDLGPHVENVKIGLVVPKYM-NVNS 419 
50 AVMWQTVANGNADF+TSAWLP THGQ + KYK+ LDDLGP+++ K+GL VPKYM +VNS 

Sbjct: 360 AVMWQTVANGNADFSTSAWLPVTHGQQYQKYKSKLDDI^PNLKGTKLGIAVPFXMTDVNS 419 

Query: 420 IEELSNQADKQITGIEPGAGIMKSAKQSLK0YPNLSSWKLLSASTGAMTTTLGKAIKNKD 479 
IE+LS QAD++ITGIEPGAGIM +A+++LK+Y NLSSW+L++ASTGAMTT+L +AIK KD 
55 Sbjct: 420 IEDLSKQADQKITGIEPGAGIMARAQKTLKEYH1ILSSWELVAASTGAMTTSLDQAIKKKD 479 

Query: 480 QWITGWSPJIWMFAKYDLKYLKDPKKSFGGEFJIIOTIARKNLKKDMPKVYKIIDKFKWTK 539 

+V+T WSPHWMFAKYDLKYLKDPK+ FG E+INTIARK LKK++P VYKIIDKF WT+ 
Sbjct: 480 PIVVTAWSPHWMFAKYDLKYLKDPKEIFGSTOTIOTIARKGLKKELPNWKIIDKFHWTQ 539 



60 



Query: 540 EDMES IMLDMDKGMEPAKAAQKWI KNHKKEVSEWTK 575 

+DME++MLD++KGM P AA+KW++ +K +VS WTK 
Sbjct: 540 KDMEAVMLDINKGMSPEAAAKKWVEANKSKVSSWTK 575 



65 



A related GBS gene <SEQ ID 8827> and protein <SEQ ID 8828> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -6.57 
GvH: Signal Score (-7.5): -5.37 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 6 value: -10.67 threshold: 0.0 
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modified ALOM score: 2.63 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5267 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00938(322 - 2025 of 2325) 

GP|718880l|gb|AAF37879.l|AF234619_2|AF234619(8 - 573 of 573) OpuABC {Lactococcus lactis} 
%Match =44.7 

%Identity =60.2 %Similarity =75.7 

Matches = 342 Mismatches = 136 Conservative Sub.s = 88 

255 285 315 345 375 405 435 465 

ANVQDETWESPKETVEA* *RGQI ILENLLQHKLPvAPFvESTTNWITKTFSGLFDFIQTIGNALMDWMTKTLLFINPLL 

: = hl =1 I hill III II II I Ih =1 I = I 
MIDLAIGQVPIANWSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWL 
10 20 30 40 50 

495 525 555 585 615 645 675 , 705 

FIVLITIAVFFLAKKKWQLPTFTFIGLLFIYNQGLlffiQLINTFNLVLVASLISIIIGVPLGIWMAKSDKVKQVVNPILDF 

I ::|| = = = II »l I I I I I I :| Mill h = l I I h : I h I I I I I I I I I I I I I I I I I = = l lllll 
MIAVVTILAILVSGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSIIIGVPLGIWMAKSDLVAKIVQPILDF 
70 80 90 100 110 120 130 

735 765 795 825 855 885 915 945 

MQTMPAFVYLIPAVAFFGIGMVPGVFASWFALPPTTOFTNIAIREIPLELIFASDSFGSTVKQKLFKVELPIjAK^ 

lllll lllllllllllllhlllllllhlimill III Ih: Ihlhllllll :|lllh|:|lll llll 
MQTMPGFVYLIPAVAFFGIGWPGVFASVIFALPPTVRMTNLGIRQVSTELVEAADSFGSTARQKLFKLEFPLAKGTIMA 
150 160 170 180 190 200 210 

975 1005 1035 1065 1095 1125 1155 1185 

GINQTMMLALSMVOTGSMIGAPGLGREVLSALQHADIGTGFVSGLSLVIIAIV^ 

hllhllllllll lllimill Ihhl llll llllhllllll|::ll =1 =1 I III llll ' 

GTOQTIMLALSMWIASMIGAPGLGRGVLAAVQSADIGKGFVSGISLVILAIIIDRFTQKLNVSPLEKQG-NPTVKKW-K 
230 240 250 260 270 280 290 

1215 1245 1275 1305 1335 1365 1395 1425 

LGALALFIIAALGRIWNMTSGNEAKGQKVKIAYVQWDSOT 

I = :|| = hi I =11 = 1= lllllll Ih = =1 hlh I lllll lllllll II 

RGIALVSLLALIIGAFSGMSFGKTASDKKOTLVYMNWDSEVASINVLTQAMKEHGFDVKTTALDNAVAWQTVAN 

310 320 330 340 350 360 370 

1455 1485 1515 1545 1575 1605 1635 1665 

TSAWLPKTHGQYFNKYKNSLDDLGPHVENVKIGLWPKYMNVNSIEELSNQADKQITGIEPGAGIMKSAKQSLKDYPNLS 

lllll II : II hi III::: hhlll I I I I I I I h I : I I h I 111111111 = 1 :: = : = ! I II 
. VSAWLPNTHKTQWQKYGKSVDLLGPNLKGAKWGFWPSYMNvNSIEDLTNQANKTITGIEPGAGVMAASEKTI^ 

390 400 410 420 430 440 450 

1695 1725 1755 1785 1815 1845 1875 1905 

SWKLLSASTGAMTTTLGKAIKNKDQWITGWSPHWMFAKYDLKYLK^ 
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DWKLVPSSSGAMTVAIiGEAIKQHKDIVITGWSPH^ 

470 480 490 500 510 520 530 

1935 1965 1995 2025 2055 2085 2115 2145 

KOTKEDMESIMLDMDKG^PAKMQKWIK^KKEVSEOT^ 

II .-Ill-Ill: I I :|h |||.-|'IM =1 I 
NWTTKDMEAVMLDIQNGKTPEEAAKNWIKDHQKEVDKWFK 

550 560 570 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1508 

A DNA sequence (GBSxl596) was identified in S.agalactiae <SEQ ID 4639> which encodes the amino 
acid sequence <SEQ ID 4640>. This protein is predicted to be a transposase. Analysis of this protein 
sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.65 Transmembrane 223 - 239 ( 223 - 240) 



Final Results 

bacterial membrane Certainty=0 . 1659 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10057> which encodes amino acid sequence <SEQ ID 
10058> was also identified. A related GBS nucleic acid sequence <SEQ ID 10031> which encodes amino 
acid sequence <SEQ ID 10032> was also identified. A related GBS nucleic acid sequence <SEQ ID 10801> 
which encodes amino acid sequence <SEQ ID 10802> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA50689 GB:X71844 putative transposase [Clostridium perfringens] 
Identities = 94/364 (25%) , Positives = 160/364 (43%) , Gaps = 35/364 (9%) 

Query: 8 KHKHLTLLDRlTOIQSGLDRGETFKftlGLNLLKHPTTIAKEVKRW- - KQLRESTKDCLDCP 65 

K+KHL + +R ++ L G + L + T+ E++R KQ+++ + + 

Sbjct: 12 KNKHMMKERMIVEIRLKDGFSAYKOTKEI^PIN^ 71 

Query: 66 LLRKAPYVCNGCPKRRINCGyKKTFyiAKQAQRNYEKLLVESREGIPLNKETFWKIDRVL 125 

+A Y N + + N YK ++ K +V+ K W +D + 

Sbjct: 72 DTGEAVYKKN RLKSNRKYKLL ECSDFIKYWDKV KNDHWSLDACV 116 

Query: 126 SNGVKKGQRIYHILKTNDLEVSSSTVYRHIKKGYLSITPIDLPRAVKFKKRRKSTLPPIP 185 

G+ ++ + +S+ T+Y ++ G L I IDLP K + +KST 
Sbjct: 117 GEALHSSRFSPSQIISTKTLYNYVULGLLPIKNIDLP--AKLHRNKKSTRVRNN 168 

Query: 186 KAIKEGRRYEDFIEHM-NQSEIiNSWLElTOTvT:GRIGGK--vIiLTFNVAFCNFIFAKLMDS 242 

K KG D +N+E W E+D V+G K VLLT + MS 

Sbjct: 169 KK- KLGTS I SDRPNS IENREEFGHW -EIDCTLGEKSNKDKVLLTL VERKTRYAI I SEMSS 226 

Query: 243 KTAIETAKHIQVIKRTLYDNKRDFFELFPVILTDHGGEFARVDDIEIDVCGQSQLFFCDP 302 

+ 1 K + IK L F E+F I DNG EFA + + E+ +++++F P 

Sbjct: 227 HSTISVTKALDKIKEFLGSK FSEVFKSITADNGSEFADLSEFELKT--KTKVYFTHP 281 

Query. 303 MSDQKARIEKlfflTLvOTILPKGTSFDNLTQEDINljAL 362 

S +K E+++ L+R +PKG + + E 1+ + +N++ R+ L+ KT ELF 
Sbjct: 282 YSSFEKGTHERHNGLIRRFIPKGKRISDYSLETISFIENWMNTLPRKLLDYKrPEELFEI 341 

Query: 363 TYGK 366 
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K 

Sbjct: 342 HLDK 345 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1509 

A DNA sequence (GBSxl597) was identified in S.agalactiae <SEQ ID 4641> which encodes the amino 
acid sequence <SEQ ID 4642>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have an uncleavable N-term signal seq 
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232 


( 


215 - 


232) 


INTEGRAL 


Likelihood 




-1.22 


Transmembrane 


147 - 


163 


< 


147 - 


165) 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 943 1> which encodes amino acid sequence <SEQ ID 9432> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07666 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 112/224 (50%) , Positives = 150/224 (66%) , Gaps = 2/224 (0%) 



Query: 


8 


IKDILWFIIPSLFGVLLLMTPFKYNGMTTVAVSVISKTINQWINAVFPIHYIILLIIFIS 


67 






+KD LWF+IPS+ GV L M P + + T+ V+ ++K + ++ P I+L I + 




Sb j ct : 


19 


LKTJYLWFLIPSIIGVGLFMVPIQKIJNAITIPVAFLAKQLQGALDDHLPAILTIMLAIVV- 


77 


Query: 


68 


CVI^CYRLFRPSFIEKNDLLKEISDITIFWLIIRLIGIALGIMTVLHIGPEMVVIGKETG 


127 






VL+ LF+P+ KN LLK + I WL++R++G MT+L +GPE VW + TG 




Sb j ct : 


78 


-VLSCVATLFKPNLFMKNGLLKSLFVIHPMV&VVRvLGFIFAFMTLLQLGPEAVWSEGTG 


136 


Query: 


128 


GLILFDLIGGLFTIFLAAGFILPFLTEFGLLEFVGVFLTPIMRPFFQLPGRSAVNCVASF 


187 






L+L+DL+ LFTIFL AG LPFL FGLLE GV L MRP F LPGRS+++C+AS+ 




Sbjct: 


137 


ALLLYDLLPLLFTIFLFAGLFLPFLLNFGLLELFGVLLNKFMRPVFTLPGRSSIDCLASW 


196 


Query: 


188 


VGDGTIGIALTDKQYVEGYYTSREAATISTTFSAVSITFCLXXL 231 








+GDGTIG+ LT+KQY EG+YT REAA ISTTFS VSITF + L 




Sbjct: 


197 


MGDGTIGVLLTNKQYEEGFYTQREAAVISTTFSWSITFSIWL 240 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1510 

A DNA sequence (GBSxl599) was identified in S.agalactiae <SEQ ID 4643> which encodes the amino 
acid sequence <SEQ ID 4644>. This protein is predicted to be Na/H antiporter homolog (kefB). Analysis of 
this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-1670- 

Possible site: 17 

>» Seems to have an uncleavable N-term signal seg 



INTEGRAL 


Likelihood 




10 


14 


Transmembrane 


176 


- 192 


171 


- 203) 


INTEGRAL 


Likel ihood 




-9 


34 


Transmembrane 


353 


- 369 


( 348 


- 373) 


INTEGRAL 


Likelihood 




-9 


24 


Transmembrane 


3 


19 


( 1 


- 26) 


INTEGRAL 


Likelihood 




-7 


17 


Transmembrane 


145 


- 161 


[ 142 


- 168) 


INTEGRAL 


Likelihood 




-7 


01 


Transmembrane 


86 


- 102 


( 81 


- 108) 


INTEGRAL 


Likelihood 




-6 


53 


Transmembrane 


52 


- 68 


( 51 


- 72) 


INTEGRAL 


Likelihood 




-5 


79 


Transmembrane 


24 


- 40 


( 23 


- 49) 


INTEGRAL 


Likelihood 




-5 


52 


Transmembrane 


214 


- 230 


( 209 


- 233) 


INTEGRAL 


Likelihood 




-4 


04 


Transmembrane 


260 


- 276 


! 258 


- 278) 


INTEGRAL 


Likelihood 




-3 


66 


Transmembrane 


287 


- 303 


( 287 


- 308) 


INTEGRAL 


Likelihood 




-2 


71 


Transmembrane 


113 


- 129 


( 112 


- 129) 


INTEGRAL 


Likelihood 




-2 


66 


Transmembrane 


332 


- 348 


( 330 


- 349) 



Final Results 

bacterial membrane Certainty=0 . 5055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA51756 GB:X73329 Na/H antiporter homolog [Lactococcus lactis] 
Identities = 208/376 (55%), Positives = 285/376 (75%), Gaps = 3/376 (0%) 



Query: 


1 


MHIIIQITIILLASVLATLISKRIGIPAWGQLLVGIIIGPAMLGLVHQNQVLHVLSEIG 


60 






M+ I+Q+TI+L+AS++ATL S+R+ IPAV+GQ+LVGI+I P++LGLVH VL V+SEIG 




Sbjct: 


1 


MNDILQLTI VLIASLIATLASRRLKIPAVIGQMLVGILIAPSVLGLVHSGHVLEVMSEIG 


60 


Query: 


61 


VI LLMFLAGLEANFDLLKKYLKPSLLVAI TGVI VPMAL F YFLTRLFGFQINTAI FYGLVF 


120 






VILLMFLAGLE++ +LKK K S+LVAI GVIVP+ +F + FG+ ++T+ FYG+VF 




Sbjct: 


61 


VILLMFIAGLESDLTVLKKNFKASMLVAIGGVIVPLIVPGLVAFSFGYGMSTSFFYGIVF 


120 


Query: 


121 


AATSISITVEVLQEYNRVKTDTGAIII/^VADDVIAVIiLLSWIA-- 


178 






AATS+SITVEVLQEY ++ T G+IILGAAV DD+LAVL+LS+F + GS +++ Q 




Sbjct: 


121 


AATS VS I TVE VLQEYGKLSTRAGSIILGAAWDDILAVL ILS I FTS FKNGGSGTHLFFQF 


180 


Query: 


179 


IIQLLFFVFLFICMKYLVPALFKLIEKVHFFEKYTILAILICFSLSILADKVGMSSIIGS 


238 






+++LLFF FLF+ K L+P +K ++K+ K TI+A++IC LS+LAD VGMS++IGS 




Sb j ct : 


181 


LLELLFFAFLFWHK- LIPRFWKFVQKLPIANKNTIVALI I CLGLSLLADSVGMSAVIGS 


239 


Query: 


239 


FFAGLAIGQTSFVDKVEHKISLLSYTFFIPIFFASIALPLKFDGMMSHLHTILIFTALAV 


298 






FFAGLAI QT K+E S + Y FIP+FF IA+ ++FD ++ H IL+FT LA+ 




Sbjct: 


240 


FFAGLAISQTEVSHKIEEYTSAIGYVIFIPVFFVLIAISVQFDSLIHHPWIILLFTLLAI 


299 


Query: 


299 


LSKLIPGYFVGRGFNFSKLESLTIGGGMVSRGEMALIIVQVGLAAKIISSTTYSELVIW 


358 






L+K IP YFVG+ S ES+ IG GM+SRGEMALI+ Q+GL + 11+ YSELVIV+ 




Sb j ct : 


300 


LTKFIPAYFVGKSNKLSTGESMliIGTGMISRGEMALIVAQIGLTSAIITDEVYSELVIVI 


359 


Query: 


359 


ILSTIIAPFILKYSFK 374 








IL+T++APF++K K 




Sbj ct: 


360 


ILATVLAPFLIKLVLK 375 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1511 

A DNA sequence (GBSxl600) was identified in S.agalactiae <SEQ ID 4645> which encodes the amino 
acid sequence <SEQ ID 4646>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems -to have a cleavable N-term signal seq. 



WO 02/34771 



PCT/GB01/04789 



-1671- 



Final Results 

bacterial outside 

bacterial membrane 

bacterial cytoplasm 



Certainty=0. 3000 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14269 GB:Z99116 ypuA [Bacillus subtilis] 
Identities = 86/319 (26%) , Positives = 147/319 (45%) , Gaps = 34/319 (10%) 



Query: 


-a 


IKKLLFAGLAFILFTIASPAYARSDVQKVIDETyVQPDYvLGYSLNQEQRAQTLQLLNYD 


62 






+KK+ LA + LP++D ++V LG L++ + + L +N 




Sb j ct : 


1 


MICKIWIGMLAAAVLLLMVPKVSLADA- -AVGDVIV TLGADLSESDKQKVLDEMNVP 


54 


Query : 


OO 


ESRDTKVKTLNTSSYAKIMNIADDAS IQLY SSVKIKKLGSNDTIAVNIVTPENITK 


118 






++ T V N + + +A I SS+ I K GS +N+ T NI + 




Ql-i-i r*¥ - 
oJJJ . 


Do 


DNATT- VTvTNKEEHEYLGKYISNAQIGSRAISSSSITIAKKGSG LNVET-HNISG 


108 


Query: 


119 


OTEDMYRNAAVTLGIEHATISVAAPIKVTGESALAGIYYSLE-KNGASVSSENKQLAQEE 


177 






+T++MY NA +T G++ A + V AP +V+G +AL G+ + E + ++S + KQ+A +E 




Sb j ct : 


109 


ITDEMYLNALMTAGVKDAKVYVTAPFEVSGTAALTGLIKAYEVSSDEAISEDVKQVANQE 


168 


Query: 


178 


LSTLSGINAENKGKEGYDADKLNVALTDIKSAVAKGGSDLSKDDIRKIVEETLKNYHLDN 


237 






LTS+ + GE A +IK AKG +KDIK V++ + L+ 




Sb j ct : 


169 


LVTTSEL-GDKIGNENAAA LIAKIKEEPAKNGVPDNKADIEKQVDDAASD- -LNV 


220 


Query: 


238 


AVTENQINLIWFAVNLSQSNVIKNSDFTNTLNNLKDNIVSKAGSKFKNINVNFNA^ 297 






+T++Q N +V S N +KN+D + D + KA K + + 




Sb j ct : 


221 


TLTDSQKNQLV SLFNKMKNADI - -DWGQVSDQL-DKAKDKITKFIESDEGKNFI 


271 


Query: 


298 


ESGKHFLANIWQQIVNFFQ 316 








+ F +IW IV+ F+ 




Sb j ct : 


272 


QKVIDFFVSIWNAIVSIFK 290 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1512 

A repeated DNA sequence (GBSxl602) was identified in S.agalactiae <SEQ ID 4647> which encodes the 
amino acid sequence <SEQ ID 4648>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0603 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15719 GB:Z99122 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 76/138 (55%) , Positives = 91/138 (65%) , Gaps = 12/138 (8%) 

Query: 1 MKLKAVHHIAIIVSDYEKSKDFYVNKLGFEIIRENHRPERHDYKLDLRC-GDIELEIFGN 59 

M LK++HHIAII SDYEKSK FYV+KLGF++I+E +R ER YKLDL G +E+F 
Sbjct: 1 MLLKSIHHIAIICSDYEKSKAFYVHKLGFQVTQETYREERGSYKLDLSLNGSYVIELF-- 58 

Query: 60 RLDDPEYETPPQRIGRPNWPREACGLRHLAFYVPDVEAYKVELENLGIFVEPIRYDDYTG 119 

+ PP+R RP EA GLRHIAF V ++ EL GI EPIR D TG 
Sbjct: 59 SFPDPPERQTRP FAAGLRHLAFTVGSLDKAVQELHEKGIETEPIRTDPLTG 109 
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Query: 120 KKMTFFFDPDGLPLELHE 137 

K+ TFFFDPD LPLEL+E 
Sbjct: 110 KRFTFFFDPDQLPLELYE 127 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4649> which encodes the amino acid 
sequence <SEQ ID 4650>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/137 (72%) , Positives = 116/137 (84%) 

Query: 1 MKLKAVHHIAIIVSDYEKSKDFYVNKLGFEIIRENHRPERHDYKLDLRCGDIELEIFGNR 60 

MKL A+HH+AIIVSDY SKDFYVNKLGFEIIREN+RP++HDYKLDL CG IELEIFG 
Sbjct: 2 MKimiHHVAIlVSDYHLSKDFYVNKLGFEIIRENYRPDKHDYKLDLSCGRIELEIFGKV 61 

Query: 61 LDDPEYETPPQRIGRPNWPREACGLRHLAFYVPDVEAYKVELENLGIFVEPIRYDDYTGK 120 

DP Y+ PP+R+ P + EACGLRHLAF V ++E+Y +L++LGI VEPIR+DDYTG+ 
Sbjct: 62 TSDPNYQAPPKRVSEPEFKSEACGLRH1AFRVTNIESYVDDLKSLGIPVEPIRHDDYTGE 121 

Query: 121 KMTFFFDPDGLPLELHE 137 

KMTFFFDPDGLPLELHE 
Sbjct: 122 KMTFFFDPDGLPLELHE 138 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1513 

A DNA sequence (GBSxl603) was identified in S.agalactiae <SEQ ID 4651> which encodes the amino 
acid sequence <SEQ ID 4652>. This protein is predicted to be alpha-amylase. Analysis of this protein 
sequence reveals the following: 

Possible site: 40 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-11.62 Transmembrane 14 - 30 ( 7 - 36) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG41778 GB-.AF213261 sortase [Streptococcus gordonii] 
Identities = 136/247 (55%) , Positives = 174/247 (70%) , Gaps = 2/247 (0%) 

Query: 2 RNKKKSHGFFNFTOWLLVVLLIIVGLALVFNKPIRNAFIAHQSNHYQISRVS 61 

R KK N + +L V+L++V LAL+FN IRN + +N YQ+S+VSKK IEKNK 

Sbjct: 6 RRAKKKRSRRNIILNILSVILLLVALALIFNSSIRNMIMVTWTNKYQVSKVSKKEIEKNK 65 

Query: 62 KSKTSYDFSSVKSISTESILSAQTKSHNLPVIGGIAIPDVEINLPIFKGLGNTELSYGAG 121 

SK S++F V+ +STE++L+AQ K+ LPVIGGIAIP++ +NLPIF GL N L YGAG 
Sbjct: 66 ASKGSFNFEKVEPLSTEAVimQWKAQQLPvTIGGIAIPELSLMLPIFNGLENAGLYYGAG 125 

Query: 122 TMKENQIMGGPNNYAIASHHVFGLTGSSKMLFSPLEHAKKGMKVYLTDKSKVYTYTITEI 181 
TMKE Q M G NYALASHHVFG+TG+++MLFSPL+ AK GMK+YLTDK KVYTY+IT + 



Final Results 



bacterial cytoplasm Certainty=0. 1205 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5649 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 
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Sbjct: 126 TMKETQEM-GKGNYALASHHVFGITGANEMLFSPLDRAKAGMKIYLTDKEKVYTYSITSV 184 

Query: 182 SKVTPEHVEVIDD-TPGKSQLTLVTCTDPEATERIIVHAEIiEKTGEFSTADESILKAFSK 240 

V PE V+V+DD G +++TLVTC D AT R IV LE + + IL F+K 
Sbjct: 185 ENVEPERVDVVDDARDGTAEVTLVTCEDAAATSRTIVKGVLESETPYKETPKKILNYFNK 244 

Query: 241 KYNQINL 247 

YNQ+ L 
Sbjct: 245 SYNQMQL 251 



A related DNA sequence was identified in S.pyogenes <SEQ ID 465 3> which encodes the amino acid 
sequence <SEQ ID 4654>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
>>> Seems to have an uncleavable N-term signal seq 
15 INTEGRAL Likelihood = -8.12 Transmembrane 18 - 34 ( 13 - 38) 

INTEGRAL Likelihood = -0.32 Transmembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane Certainty=0 .4248 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA73122 GB:M77279 alpha-amylase [unidentified cloning vector] 
25 Identities = 60/122 (49%), Positives = 85/122 (69%) 

Query: 7 RRKI KSMSWARKLLIA VLL I LGLALL FNKP I RNTL I ARNSNKYQVTKVSKKQI KKNKEAKS 67 

+ K + +W L+ +L I+GLAL+FN IR+ ++ +NS Y V+K+ +KKN ++ 
Sbjct: 4 KEKKRGKNWLINSLLVLLFIIGIiALIFlOTQIRSWWQQNSRSYAVSKLKP^ 64 

30 

Query: 68 TFDFQAVEPVSTESVLQAQMAAQQLPVIGGIAIPELGINLPIFKjGLGNTELIYGAGTMKEE 127 

TFDF +VE +STE+V++AQ + LPVIG IAIP + INLPIFKGL N L+ GAGTMKE+ 
Sbjct: 65 TFDFDSVESLSTEAVMKAQFENKNLPVIGAIAIPSVEINLPIFKGLSNVALLTGAGTMKED 124 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 147/245 (60%) , Positives = 192/245 (78%) 

Query: 2 RNKKKSHGFFNFWWLLVVLLIIVGLALVFNKPIKNAFIAHQSNHYQISRVSKKTIEKNK 61 
+ K++ ++ R LL+ +L+ 1 +GLAL+FNKPIRN IA SN YQ+++VSKK I+KNK 

40 Sbjct: 4 KQKRRKIKSMSWARKLLIAVLLILGIjALLFNKPIRNTLIARNSNKYQVTKV'SKKQIKKNK 63 

Query: 62 KSKTSYDFSSVKSISTESILSAQTKSHNLPVIGGIAIPDVEINLPIFKGLGNTELSYGAG 121 

++K+++DF +V+ +STES+L AQ + LPVIGGIAIP++ INLPI FKGLGNTEL YGAG 
Sbjct: 64 EAKSTFDFQAVEPVSTESVLQAQMAAQQLPVIGGIAIPELGINLPIFKGLGNTELIYGAG 123 

45 

Query: 122 TMKENQIMGGPNNYALASHHVFGLTGSSKMLFSPLEHAKKGMKVYLTDKSKVYTYTITEI 181 
TMKE Q+MGG NNY+LASHH+FG+TGSS+MLFSPLE A+ GM +YLTDK K+Y Y I ++ 
' Sbjct: 124 TMKEEQVMGGENNYSLASHHIFGITGSSQMLFSPLERAQNGMSIYLTDKEKIYEYIIKDV 183 

50 Query: 182 SKVTPEHVEVIDDTPGKSQLTLVTCTDPEATERIIVHAELEKTGEFSTADESILKAFSKK 241 

V PE V+VIDDT G ++TLVTCTD EATERIIV EL+ +F A +LKAF+ 
Sbjct: 184 FTVAPERvDVIDDTAGLKEVTLVTCTDIEATERIIVKGELKTEYDFDKAPADVLKAFNHS 243 

Query: 242 YNQIN 246 
55 YNQ++ 

Sbjct: 244 YNQVS 248 

SEQ ID 4652 (GBS266) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 11; MW 26kDa). 



60 GBS266-His was purified as shown in Figure 205, lane 10. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1514 

A DNA sequence (GBSxl604) was identified in S.agalactiae <SEQ ID 4655> which encodes the amino 
acid sequence <SEQ ID 4656>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1934 (Affirmative) <: suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4657> which encodes the amino acid 
sequence <SEQ ID 4658>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 711/819 (86%) , Positives = 767/819 (92%) 



Query. 


1 


MQDKNLVDVNLTSEMKTSFIDYAMSVIVARRLPDVRDGLKFVHRRILYQMLffiLGVTPDKP 


60 






MQD+NL+DVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRILYGMNELGVTPDKP 




Sbjct: 


1 


MQDRNLIDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRILYGMNELGVTPDKP 


60 


Query: 


61 


HKKSARITGDVMGKYHPHGDSSIYEAMVRMAQWWSYRHMLVDGHGNFGSMDGDGAAAQRY 


120 






HKKSARITGDVMGKYHPHGDSSIYFAMVRMAQWWSYRHMLVDGHGNFGSMDGDGAAAQRY 




Sb j ct : 


61 


HKKSARITGDVMGKYHPHGDSSIYilAMvRMAQWWSYRHMLVDGHGNFGSMDGDGAAAQRY 


120 


Query: 


121 


TEARMSKIALEMLRDINKNIVDFQDNYDGSEREPLVljPARFPNLLVNGATGIAVGmTNI 


180 






TEARMSKIALE+LRDINKNTV+FQDNYDGSEREP+VLPARFPNLLVNGATGIAVGMATNI 




Sbjct: 


121 


TEARMSKIALELLRDINKNTVNFQDNYDGSEREPVVLPARFPNLLVNGATGIAVGMATNI 


180 


Query: 


181 


PPHNLGESIDAVKLVMDNPDVTTRELMEVIPGPDFPTGALVMGRSGIHRAYETGKGSIVL 


240 






PPHNLi ESIDAVK+VM++PD TTRELMEVIPGPDFPTGALVMGRSGIHRAY+TGKGSIVL 




Sb j ct : 


181 


PPHNLAESIDAVKMVMEHPDCTTRELMEVIPGPDFPTGALVMGRSGIHRAYDTGKGSIVL 


240 


Query: 


241 


RSRTEIETTSNGKERIWTEFPYGVNKTKVHEHIVRLAQEKRIEGITAVRDESSREGVRF 


300 






RSRTEIETT G+ERIWTEFPYGVNKTKVHEHI VRLAQEKR+EGITAVRDESSREGVRF 




Sb j ct : 


241 


RSRTEIETTQTGRERIWTEFPYGVNKTKVHEHIVRLAQEKRLEGITAVRDESSREGVRF 


300 


Query: 


301 


VIEVRRAASAWIIjNNLFKLTSLQTNFSFNMIiAIEKGVPKILSLRQIIDNYIEHQKEVIV 


360 






VIE+RR ASA VILNNLFKLTSLQTNFSFNMLAIE GVPKILSLRQIIDNYI HQKEVI+ 




Sbjct: 


301 


VIEIRREASAWILNNLFKLTSLQTNFSFNMLAIENGVPKILSLRQIIDNYISHQKEVIl 


360 


Query: 


361 


RRTQFDKAKAGARAHILEGLLVALDHLDEVITIIRNSETDTIAQAELMSRFELSERQSQA 


420 






RRT+FDK KA ARAHILEGLL+ALDHLDEVI IIRNSETD IAQ ELMSRF+LSERQSQA 




Sb j ct : 


361 


RRTRFDKDKAEARAHILEGLLIALDHLDEVIAIIRNSETDVIAQTELMSRFDLSERQSQA 


420 


Query: 


421 


ILD^LRRLTGLERDKIQSEYNDLLALIADLADIIjAKPERVVTIIKEEMDEVKRKYADAR 


480 






ILDMRLRRLTGLERDKIQSEY+DLLALIADL+DILAKPER++TIIKEEMDE+KRKYA+ R 




Sb j ct : 


421 


ildi^lrrltglerdkiqseyddlialiadlsdilakperiitiikeemdeikrkyanpr 


480 



Query: 481 RTELMIGEVLSLEDEDLIEEEDVLITLSNKGYIKRLAQDEFRAQKRGGRGIQGTGVNNDD 540 
RTELM+GEVLSLEDEDLIEEEDVLITLSNKGYIKRLAQDEFRAQKRGGRG+QGTGVNNDD 
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Sbjct: 


481 


RTELMVGEVLSLEDEDLIEEEDVLITLSNKGYIKRLAQDEFRAQKRGGRGVQGTGVNNDD 


540 


Query: 


541 


FWELVSTSTHDTVLFFTNLGRVYRLKAYEIPEYGRTAKGLPIVNLLKLDEGETIQTIIN 


600 






FVREL+STSTHDT+LFFTN GRVYRLKAYEIPEYGRTAKGLPI VTILLKL++GETIQTI IN 




Sbjct: 


541 


FVRELISTSTHDTLLFFTNFGRVYRLKAYEIPEYGRTAKGLPIVNLLKLEDGETIQTIIN 


600 


Query: 


601 


ARKEDVANKYFFFTTQQGIVKRTSVSEFSNIRCjNGIiRA 


660 






ARKE+ A K FFFTT+QGIVKRT VSEF+NIRQNGLRA+ LKE D+LINVLL +D+I 




Sbjct: 


601 


ARKEETAGKSFFFTTKQGIVKRTEVSEFNNIRQNGLRALKLKEGDQLINVLLTSGQDDII 


660 


Query: 


661 


IGTRTGYSTOFKAmAVRNMGRTATGTOGVNLREGDKWGASRIVNGQEVLIlTEKGYGKR 


720 






IGT +GYSVRF ++RNMGR+ATGVRGV LRE D+WGASRI + QEVL+ITE G+GKR 




Sbjct: 


661 


IGTHSGYSVRFNEASIRNMGRSATGVRGVKLREDDRWGASRIQDNQEVLVITENGFGKR 


720 


Query: 


721 


TEASEYPTKGRGGKGIKTANITAKNGPLARLVTINGNEDIMVITDTGVIIRTNVANISQT 


780 






T A++YPTKGRGGKGIKTANIT KNG LA LVT++G EDIMVIT+ GVIIRTNVAMISQT 




Sbjct: 


721 


TSATDYPTKGRGGKGIKTANITPKNGQLAGLV , ^VDGTEDI^WITNKGVIIRTNVANISQT 


780 


Query: 


781 


GRSTMGVKVMRLDQEAKIVTVALVEQEIEDKSNIEDTKE 819 








GR+T+GVK+M+LD +AKIVT LV+ E + I +E 




Sb j ct : 


781 


GRATLGVKIMKLDADAKIVTFTLVQPEDSSIAEINTDRE 819 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1515 

A DNA sequence (GBSxl605) was identified in S.agalactiae <SEQ ID 4659> which encodes the amino 
acid sequence <SEQ ID 4660>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA04010 GB-.AJ000336 L- lactate dehydrogenase [Streptococcus pneumoniae] 
Identities = 290/329 (88%) , Positives = 313/329 (94%) , Gaps = 1/329 (0%) 

Query: 1 MTATKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPALFDKAVGDAEDLSHALAF 60 

MT+TKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIP h +KAVGDA DLSHALAF 
Sbjct: 1 MTSTKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPQLHEKAVGDALDLSHALAF 60 

Query: 61 TSPKKIYAATYADCADADLWITAGAPQKPGETRLDLVGKNIAINKSIVTQVVESGFNGI 120 

TSPKKIYAA Y+DCADADLWITAGAPQKPGETRLDLVGKNLAINKSIVTQWESGF GI 
Sbjct: 61 TSPKKIYAAQYSDCADADLWITAGAPQKPGETRLDLVGKNLAINKSIVTQWESGFKGI 120 

Query: 121 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALADKIGVDARSVHAYIMGE 180 

FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALA+K+ VDARSVHAYIMGE 
Sbjct: 121 FLVAANPVnVLTYSTWKFSGFPKERVIGSGTSI^SARFRQALAEKLDVDARSVHAYIMGE 180 

Query: 181 HGDSEFAWSHANVAGVQLEQWLQENRDIDEC^LvBLFISVRDAAYSIINKKGATYYGIA 240 

HGDSEFAVWSHKN+AGV LE++L++ +++ E L++LF VRDAAY+ I INKKGATYYGI A 
Sbjct: 181 HGDSEFAVWSHANIAGVNLEEFLKDTQNVQEAELIELFEGVRDAAYTIINKKGATYYGIA 240 

Query: 241 VAIiARITKAILDDENRVIjPLSvYQEGQYGDvTOWIGQPAIVGAHGIVRPvNIPLNDAEL 300 

VALARITKAILDDENAVLPLSV+QEGQYG V++VFIGQPA+VGAHGIVRPVNIPLNDAE 
Sbjct: 241 VAIARITKAILDDENAVLPLSVFQEGQYG-VENVEIGQPAWGAHGIWPWIPLNDAET 299 



Query: 
Sb j ct : 



301 QKMQASAEQLKDI1DEAWKNPEFQEASKN 329 

QKMQASA++L+ I IDEAWKNPEFQEASKN 
300 QKMQASAKELQAI IDEAWKNPEFQEASKN 328 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 466 1> which encodes the amino acid 
sequence <SEQ ID 4662>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

5 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 106 - 122 ( 106 - 122) 

Final Results 

10 bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

15 >GP:AAB81558 GB:U60997 L(+) -lactate dehydrogenase [Streptococcus 

bovls] 

Identities = 278/329 (84%) , Positives = 297/329 (89%) , Gaps = 2/329 (0%) 

Query: 1 MTATKQHKKVILVGDGAVGSSYAFALVTQNIAQELGIIDIFK--EKTQGDAEDLSHALAF 58 
20 MTATKQHKKVILVGDGAVGSSYAFALV Q IAQELGII+I + K GDAEDLSHALAF 

Sbjct: 1 MTATKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPQLFNKAVGDAEDLSHALAF 60 

Query: 59 TSPKKIYAADYSDCHDADLVVLTAGAPQKPGETRLDLVEKNLRINKEVVTQIVASGFKGI 118 
TSPKKIYAA Y DC DADLW+TAGAPQKPGETRLDLV KNL INK +VT++V SGFKGI 
25 Sbjct: 61 TSPKKIYAAKYEDCADADLWITAGAPQKPGETRLDLVGKNLAINKSIVTEWKSGFKGI 120 

Query: 119 FLVAANPvDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALAAKIGVDARSVHAYIMGE 178 

FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQAIA K+ VDARSVHAYIMGE 
Sbjct: 121 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFROJiliAEKIjDvDARSVHAYIMGE 180 

30 

Query: 179 HGDSEFAWSHANVAGVGLYDWLQANRDIDEQ^LVDLFISVRDAAYSIINKKGATFYGIA 238 

HGDSEFAVWSHANVAGV L +L+ ++++E LV+LF VRDAAYS I INKKGATFYGIA 
Sbjct: 181 HGDSEFAWSHANVAGWLESYLKDVQNVEEAELVELFEGVRDAAYSI INKKGATFYGIA 240 

35 Query: 239 VALARITKAILDDENAVLPLSVFQEGQYEGVEDCYIGQPAIVGAYGITOPVNIPLNDAEL 298 

VALARITKAIL+DENAVLPLSVFQEGQY V DCYIGQPAIVGA+GITOPVNIPLNDAE 
Sbjct: 241 VAl^ITKAILNDENAVIjPLSVFQEGQYANVTDCYIGQPAIVGAHGIVRPVNIPLNDAEQ 300 

Query: 299 QKMQASANQLKAI IDEAFAKEEFASAAKN 327 
40 QKM+ASA +LKAIIDEAF+KEEFASA KN 

Sbjct: 301 QKMEASAKELKAI IDEAFSKEEFASACKN 329 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 286/329 (86%) , Positives = 299/329 (89%) , Gaps = 2/329 (0%) 

45 

Query: 1 MTATKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPALFDKAVGDAEDLSHALAF 60 

MTATKQHKKVILVGDGAVGSSYAFALV Q IAQELGII+I +K GDAEDLSHALAF 
Sbjct: 1 MTATKQHKKVILVGDGAVGSSYAFALVTQNIAQELGIIDI--FKEKTQGDAEDLSHALAF 58 

50 Query: 61 TSPKKIYAATYADCADADLWITAGAPQKPGETRLDLVGKNLAINKSIVTQWESGFNGI 120 

TSPKKIYAA Y+DC DADLW+TAGAPQKPGETRLDLV KNL INK +VTQ+V SGF GI 
Sbjct: 59 TSPKKIYAADYSDCHDADLWLTAGAPQKPGETRLDLVEKNLRINKEWTQIVASGFKGI 118 

Query: 121 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALADKIGVDARSVHAYIMGE 180 
55 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALA KIGVDARSVHAYIMGE 

Sbjct: 119 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALAAKIGVDARSVHAYIMGE 178 

Query: 181 HGDSEFAVWSHANVAGVQLEQWLQENRDIDEQGLVDLFISVRDAAYS I INKKGATYYGIA 240 
HGDSEFAVWSHANVAGV L WLQ NRDIDEQGLVDLFISVRDAAYSI INKKGAT+YGIA 
60 Sbjct: 179 HGDSEFAVWSHANVAGVGLYDWLQANRDIDEQGLVDLFISVRDAAYSIINKKGATFYGIA 238 

Query: 241 VALARITKAILDDENAVLPLSvYQEGQYGDVKDVFIGQPAIVGAHGIVRPVNIPLNDAEL 300 

VALARITKAILDDENAVLPLSV+QEGQY V+D +IGQPAIVGA+GIVRPVNI PLNDAEL 
Sbjct: 239 VAIiARITKAILDDENAVLPLSVFQEGQYEGVEDCTIGQPAIVGAYGIVRPVNIPLNDAEL 298 
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Query: 301 QKMQASAEQLKDI IDEAWKNPEFQEASKN 329 

QKMQASA QLK IIDEA+ EF A+KN 
Sbjct: 299 QKMQASANQLKAI IDEAFAKEEFASAAKN 327 

5 

SEQ ID 4660 (GBS312) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 7; MW 40kDa). 

GBS312-His was purified as shown in Figure 205, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1516 

A DNA sequence (GBSxl606) was identified in S.agalactiae <SEQ ID 4663> which encodes the amino 
acid sequence <SEQ ID 4664>. This protein is predicted to be NADH oxidase (nox). Analysis of this 
protein sequence reveals the following: 

15 Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 188 8 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC26485 GB:AF014458 NADH oxidase [Streptococcus pneumoniae] 
25 (ver 2) 

Identities = 363/458 (79%) , Positives = 408/458 (88%) , Gaps = 3/458 (0%) 

Query: 1 MSKIVWGTNHAGTAAIKTMLSNYGEANEIVTFDQNSNISFLGCGMALWIGEQIDGPEGL 60 
MSKIWVG NHAGTA I TML N+G NEIV FDQNSNISFLGCGMALWIGEQIDG EGL 
30 Sbjct: 1 MSKI VVVGANHAGTACIISITMLDNFGNENEIVVFDQNSNISFLGCGMALWIGEQIDGAEGL 60 

Query: 61 FYSDKEQLESMGAKVYMNSPVLNIDYD10<EOTALVDGKEHVESYEKLILATGSQPIIPPI 120 

FYSDKE+LE+ GAKVYMNSPVL+IDYD K VTA V+GKEH ESYEKLI ATGS PI+PPI 
Sbjct: 61 FYSDKEKLFJU<GAKVYMNSPVLSIDYDNKVVTAEVEGKEHKESYEKLIFATGSTPILPPI 120 

35 

Query: 121 KGVEIQEGSREFKATLENLQFVKLYQNSEEVIEKLAKPG- - INRVAWGAGYIGVELAEA 178 

+GVEI +G+REFKATLEN+QFVKLYQN+EEVI KL+ ++R+AWG GYIGVELAEA 

Sbjct: 121 EGVEIVKGNREFKATLENVQFVKLYQNAEEVINKLSDKSQHLDRIAWGGGYIGVELAEA 180 

40 ' Query: 179 FQRIGKEVTLvOVADTCMGGYYDRDFTDMMSKNLEDHGIRLAFGQAVQAVEGDGKVERLV 238 

F+R+GKEV LVD+ DT + GYYD+DFT MM+KNLEDH IRLA GQ V+A+EGDGKVERL+ 
Sbjct: 181 FERLGKEWLVDIVDTVIjNGYYDKDFTQMMAK]&^ 240 

Query: 239 TDKETFDVDMVIIAVGFRPNTELGAGKLDTFRNGAWVVDKKQETSVKDVYAIGDCATIWD 298 
45 ' TDKE+FDVDMVILAVGFRPNT L GK++ FRNGA++VDKKQETS+ VYA+GDCAT++D 

Sbjct: 241 TDKESFDVDMVILAVGFRPNTAIuADGKIELFRNGAFLvDKKQETSIPGVYAVGDCATvYD 300 

Query: 299 NSRDDINYIALASNATOTGIVAAHNAraTELEGAGVCGSNGISIYGLNMVSTGLTLEKAK 358 
N+R D +YIALASNAVRTGIV A+NACG ELEG GVQGSNGISIYGL+MVSTGLTLEKAK 
50 Sbjct: 301 NARKDTSYI7iIiASNAvRTGIVGAYNACGHEIiEGIGVQGSNGISIYGLHMVSTGLTLEKAK 360 

Query: 359 QAGYNAVETGFNDLQKPEFIKHNNHEVAIKIVYDKDSRVILGCQMVSHE-DVSMGIHMFS 417 

AGYNA ETGFNDLQKPEF+KH+NHEVAIKIV+DKDSR ILG QMVSH+ +SMGIHMFS 
Sbjct: 361 AAGYNATETGFNDLQKPEFMKHDNHEVAIKIVFDKDSREILGAQ^IVSHDIAISMGIHMFS 420 



55 



Query: 418 LAIQEKVTIEKIiALTDIFFLPHFNKPYNYITMAALGAK 455 

LAIQE VTI+KLALTD+FFLPHFNKPYNYITMAAL A+ 
Sbjct: 421 IiAIQEHVTIDKLALTDLFFLPHFNKPYNYITMAALTAE 458 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4665> which encodes the amino acid 
sequence <SEQ ID 4666>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2068 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 362/456 (79%) , Positives = 403/456 (87%) 



Query: 


1 


MSKIVWGTNHAGTAAIKTMLSNYGEANEIVTFDQNSNISFLGCGMALWIGEQIDGPEGL 


60 






MSKIVWG NHAGTA IKTML+NYG+ANEIV FDQNSNISFLGCGMALWIGEQI GPEGL 




Sb j ct : 


1 


MSKIVWGANHAGTACIKTMLTNYGDANEIWFDQNSNISFLGCGMALWIGEQIAGPEGL 


60 


Query: 


61 


FYSDKEQLESMGAKVY^SPVLNIDYDKKEVTALVDGKEHVESYEKLILATGSQPIIPPI 


120 






FYSDKE+LES+GAKVYM SPV +IDYD K VTALVDGK HVE+Y+KLI ATGSQPI+PPI 




Sbjct: 


61 


fysdkeeleslgakv™espvqsidydaktvtalvx)Gknhvetydklifatgsqpilppi 


120 


Query: 


121 


KGVEIQEGSREFKATLENLQFVKLYQNSEEVIEKLAKPGINRVAWGAGYIGVELAEAFQ 


180 






KG EI+EGS EF+ATLENLQFVKLYQNS +VI KL I RVAWGAGYIGVELAEAFQ 




Sbjct: 


121 


KGAEIKEGSLEFFATLENLQFVKLYQNSADVIAKLENKDIKRVAWGAGYIGVELAEAFQ 


180 


Query: 


181 


RIGKEvTLVnVADTCMGGYYDRDFTDMMSKNLEDHGIRLAFGQAVQAVEGDGKVERLVTD 


240 






R GKEV L+DV DTC+ GYYDRD TD+M+KN+E+HGI+IAFG+ V+ V G+GKVE+++TD 




Sbjct: 


181 


RKGKEVVLIDVVIDTCIAGYYDRDIiTDLMAKNMEEHGIQLAFGETVKEVAGNGKVEKIITD 


240 


Query: 


241 


KETFDVTJWIIAVGFRPNTELGAGKLDTFRNGAWVVDKKQETSVKD^/YAIGDCATIWDNS 


300 






K +DVDMVILAVGFRPNT LG GK+D FRNGA++V+K+QETS+ VYAIGDCATI+DN+ 




Sb j ct : 


241 


KNEYDVDMVIIiAVGFRPNTTLGNGKIDLFRNGAFLVNKRQETSIPGVYAIGDCATIYDNA 


300 


Query: 


301 


RDDINYIAIASNAWTGIVAAHNACGTELEGAGVQGSNGISIYGLNMVSTGLTLEKAKQA 


360 






D NYIALASNAVRTGIVAAHNACGT+LEG GVQGSNGISIYGL+MVSTGLTLEKAK+ 




Sb j ct : 


301 


TRDTNYIALASNAVRTGIVAAHNACGTDLEGIGVQGSNGISIYGLHMVSTGLTLEKAKRL 


360 


Query: 


361 


GYNAVETGFNDLQKPEFIKHNNHEVAIKIVYDKDSRVILGCQMVSHEDVSMGIHMFSLAI 


420 






G++A T + D QKPEFI+H N V IKIVYDKDSR ILG QM + EDVSMGIHMFSLAI 




Sb j ct : 


361 


GFDAAVTEYTDNQKPEFIEHGNFPVTIKIVYDKDSRRILGAQMAAREDVSMGIHMFSLAI 


420 


Query: 


421 


QEKVTIEKIiALTDIFFLPHFNKPYNYITMAALGAKD 456 








QE VTIEKLALTDIFFLPHFNKPYNYITMAALGAKD 




Sb j ct : 


421 


QEGvTIEKLALTDIFFLPHFNKPYNYITMAALGAKD 456 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1517 

A DNA sequence (GBSxl607) was identified in S.agalactiae <SEQ ID 4667> which encodes the amino 
acid sequence <SEQ ID 4668>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2319 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1518 

A DNA sequence (GBSxl608) was identified in S.agalactiae <SEQ ID 4669> which encodes the amino 
acid sequence <SEQ ID 4670>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 .4100 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9805> which encodes amino acid sequence <SEQ ID 9806> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15146 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 172/318 (54%) , Positives = 234/318 (73%) 



Query: 


5 


LSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVNVGLEGIMVIGAFSGWFNLEFA 


64 






+ + I +++ + L+YA PLI T++GG FSER G+VN+GLEG+M+IGAF+ V+FNL F 




Sb j ct : 


1 


MDIVQILSIIVPATLVYAAPLILTALGGVFSERSGWNIGLEGLMIIGAFTSVLFNLFFG 


60 


Query: 


65 


SVFGDATPWISVLVGGLVGLIFSVIHAVATWFRADHIISGTVLNLMAPSLAVFLIKVLY 


124 






G A PW+S+L G +FS+IHA A ++FRAD +SG +N++A +F++K++Y 




Sbjct: 


61 


QELGAAAPVttSLIAAMAAGALFSLIHAAAAISFRADQWSGVAINMLALGATLFIVKLIY 


120 


Query: 


125 


NKGQTDNIQESFGKFNFPILSDIPFVGDIFFKGTSLVGYIAILFSFLAWFILYKTRFGLR 


184 






K QTD I E F K PL DIP +G IFF +AI +F++WFIL+KT FGLR 




Sb j ct : 


121 


GKAQTDKIPEPFYKTKIPGLGDIPVLGKIFFSDVYYTSILAIALAFISWFILFKTPFGLR 


180 


Query: 


185 


LRSVGEHPQAADTLGINVYLMRYSGVLISGFLGGIGGAVYAQSISVNFAATTILGPGFIS 


244 






+RSVGEHP AADT+GINVY MRY GV+ISG GG+GG VYA +I+++F +TI G GFI+ 




Sbjct: 


181 


IRSVGEHPMAADTMGINVYKMRYIGvMI SGLFGGLGGGVYASTIALDFTHSTI SGQGFIA 


240 


Query: 


245 


LAAMIFGKWNPIGAMLASLFFGLSQSLAVIGSHLPLLSNIPTVYLQIAPYVLTIIVLAAF 


304 






LAA++FGKW+PIGA+ A+LFFG +QSL++IGS LPL +IP VY+ +APY+LTI+ L F 




Sb j ct : 


241 


l^AALVFGKWHPIGALGAALFFGFAQSLSIIGSLLPLFKDIPNVYMLMAPYILTILALTGF 


300 


Query: 


305 


FGQAVAPKADGINYIKTK 322 








G+A APKA+G+ YIK K 




Sbjct: 


301 


IGRADAPKANGVPYIKGK 318 





A related DNA sequence was identified in S.pyogenes <SEQ ID 467 1> which encodes the amino acid 
sequence <SEQ ID 4672>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.92 Transmembrane 73 - 89 ( 69 - 97) 
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Final Results 

bacterial membrane Certainty=0. 4567 (Affirmative) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15146 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
15 Identities = 176/318 (55%) , Positives = 239/318 (74%) 

Query: 5 MSLVTIFALLMSSMLIYATPLIFTSIGGTFSERSGWNVGLEGIMVMGAFSGIVFNLEFA 64 

M +V I ++++ + L+YA PLI T++GG FSERSGWN+GLEG+M++GAF+ ++FNL F 
Sbjct: 1 MDIVQILSIIVPATLVYAAPLILTALGGVFSERSGVWIGLEGLMIIGAFTSvLFNLFFG 60 

20 

Query: 65 ETFGKATPWIAVLVGGIVGLIFSLIHAVATINFRADHIVSGTVLNLLAPSFAVFLVKAMY 124 

+ G A PW+++L G +FSLIHA A I+FRAD VSG +N+LA +F+VK +Y 

Sbjct: 61 QELGAAAPWLSLIAAMAAGALFSLIHAAAAISFRADQTVSGVAINMLALGATLFIVKLIY 120 

25 Query: 125 GKGQTDNIQQSFGKFDFPGLSQIPVIGDIFFKNTSLIGYFAIAFSFFAWFLLYKTRFGLR 184 

GK QTD I + F K PGL IPV+G IFF + AIA +F +WF+L+KT FGLR 

Sbjct: 121 GKAQTDKIPEPFYKTKIPGLGDIPVLGKIFFSDVYYTSILAIALAFISWFILFKTPFGLR 180 

Query: 185 LRSVGEHPQAADTLGINVYLMKYYGVMISGFLGGIGGAVYAQSISVNFAVTTILGPGFIA 244 
30 +RSVGEHP AADT+GINVY M+Y GVMISG GG+GG VYA +I+++F +TI G GFIA 

Sbjct: 181 IRSVGEHPMAADTMGINOTKMRYIGVMISGLFGGLGGGVYASTIALDFTHSTISGQGFIA 240 

Query: 245 liAAMIFGKWNPVGAMLSSLFFGLSQSLAVIGAQLPLLEKIPTVYLQIAPYMVTIIILAAF 304 
LAA++FGKW+P+GA+ ++LFFG +QSL++IG+ LPL + IP VY+ +APY++TI+ L F 
35 Sbjct: 241 LAALVFGKWHPIGALGAALFFGFAQSLSIIGSLLPLFKDIPNVYMLMAPYILTILALTGF 300 

Query: 305 FGQAVAPKADGINYIKSK 322 

G+A APKA+G+ YIK K 
Sbjct: 301 IGRADAPKANGVPYIKGK 318 

40 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/322 (84%) , Positives = 301/322 (93%) 

Query: 1 MVSKLSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVNVGLEGIMVIGAFSGWFN 60 
45 +V+K+SL TIFALL SSMLIYATPLIFTSIGGTFSER G+VNVGLEGIMV+GAFSG+VFN 

Sbjct: 1 VVNKMSL\7TIFALLMSSMLIYATPLIFTSIGGTFSERSGVVNVGLEGIMVMGAFSGIVFN 60 

Query: 61 LEFASVFGDATPWISVLVGGLVGLIFSVIHAVATVNFRADHIISGTVLNLMAPSLAVFLI 120 
LEFA FG ATPWI + VLVGG+VGLI FS + IHAVAT+NFRADHI + SGTVLNL+APS AVFL+ 
50 Sbjct: 61 LEFAETFGKATPWIAVLVGGIVGLIFSLIHAVATINFFJ^HIVSGTVLNLLAPSFAVFLV 120 

Query: 121 KVLYNKGQTDNIQESFGKFNFPILSDIPFVGDIFFKGTSLVGYIAILFSFLAWFILYKTR 180 

K +Y KGQTDNIQ+SFGKF+FP LS IP +GDIFFK TSL+GY AI FSF AWF+LYKTR 
Sbjct: 121 KAMYGKGQTDNIQQSFGKFDFPGLSQIPVIGDIFFKNTSLIGYFAIAFSFFAWFLLYKTR 180 

55 

Query: 181 FGLRLRSVGEHPQAADTLGINVYLMRYSGVLISGFLGGIGGAVYAQSISvNFAATTILGP 240 

FGLRLRSVGEHPQAADTLGINVYLM+Y GV+ISGFLGGIGGAVYAQSISVNFA TTILGP 
Sbjct: 181 FGLRLRSVGEHPQAADTLGINVYLMKYYGvMISGFLGGIGGAVYAQSISVNFAVTTILGP 240 

60 Query: 241 GFISLAAMIFGKWNPIGAMLASLFFGLSQSIiAVIGSHLPLLSNIPTVYLQIAPYVLTIIV 300 

GFI+LAAMIFGKWNP+GAML+SLFFGLSQSLAVIG+ LPLL IPTVYLQIAPY++TII+ 
Sbjct: 241 GFIALAAMI FGKWNPVGAMLSSLFFGLSQSLAVTGAQLPLLEKI PTVYLQIAPYMVTI 1 1 300 

Query: 301 LAAFFGQAVAPKADGINYIKTK 322 
65 LAAFFGQAVAPKADGINYI K+K 

Sbjct: 301 LAAFFGQAVAPKADGINYIKSK 322 
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A related GBS gene <SEQ ID 8829> and protein <SEQ ID 8830> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 8.61 
GvH: Signal Score (-7.5): -1.53 
Possible site: 22 



>>> Seems to 


have a cleavable N-term signal seq. 












ALOM program 


count: 8 value: 


-7. 


75 threshold: 


0.0 










INTEGRAL 


Likelihood = -7. 


,75 


Transmembrane 


160 


- 176 


( 


157 - 


179) 


INTEGRAL 


Likelihood = -7. 


.38 


Transmembrane 


73 


- 89 


( 


70 - 


97) 


INTEGRAL 


Likelihood = -5. 


.47 


Transmembrane 


289 


- 305 


( 


284 - 


312) 


INTEGRAL 


Likelihood = -4 . 


,09 


Transmembrane 


107 


- 123 


( 


106 - 


124) 


INTEGRAL 


Likelihood = -3 . 


.24 


Transmembrane 


43 


- 59 


( 


43 - 


59) 


INTEGRAL 


Likelihood = -1. 


,91 


Transmembrane 


258 


- 274 


( 258 - 


275) 


INTEGRAL 


Likelihood = -1. 


.33 


Transmembrane 


234. 


- 250 


( 


233 - 


251) 


INTEGRAL 


Likelihood = -0 


,00 


Transmembrane 


209 


- 225 


( 


209 - 


225) 


PERIPHERAL Likelihood = 3 


,34 


139 













modified ALOM score: 2.05 
*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 4100 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00914(313 - 1266 of 1566) 

EGAD]l08729|BS3151(l - 318 of 319) hypothetical protein {Bacillus subtilis} 
GP|l934814|emb|CAB07939.l| |Z93937 unknown {Bacillus subtilis} 

GP| 2635653 | embj CAB15146 . 1 1 |Z99120 similar to hypothetical proteins {Bacillus subtilis} 
PIR| F70009 I F70009 conserved hypothetical protein yufQ - Bacillus subtilis 
%Match =34.9 

%Identity =54.1 %Similarity =76.4 
.' Matches = 172 Mismatches = 75 Conservative Sub.s = 71 

216 246 276 306 336 366 396 426 

TLQVFHLS*LKL*QLQSSSS*VSITLLSMLLNLKNK*KWSKLSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVN 

= = |:::: = 1 = 11 lll=l==ll I I I I 1 = 11 
MDIVQILS I IVPATL VYAAPLILTALGGVFSERSGWN 

10 20 30 

456 486 516 546 576 606 636 666 

VGLEGIMVIGAFSGWFNLEFASVFGDATPWISVLVGGLVGLIFSVIHAVATWFRADHIISGTVI^MAPSLAVFLIKV 

= lll|:|:||l|: 1 = 111 I =1 I 11=1=1 I =11 = 111 I = = illl= =11 =l = = l =l = = h 

IGLEGLMIIGAFTSVLFNLFFGQELGAAAPWLSLIiAAMAAGALFSLIHAAAAISFRADQWSGVAINMLALGATLFIVKL 
50 60 70 80 90 100 110 

696 726 756 786 816 846 876 906 

LYNKGQTDNIQESFGKFNFPILSDIPFVGDIFFKGTSLVGYIAILFSFLAWFILYKTRFGLRLRSVGEHPQAADTLGINV 

= 111111111 I I III =1 III =11 ::|: = llll = ll 1111 = 1111111 lllhllll 

IYGKAQTDKIPEPFYKTKIPGLGDIPVLGKIFFSDVYYTSILAIALAFISWFILFKTPFGLRIRSVGEHPMAADTMGINV 
130 140 150 160 170 180 190 

936 966 996 1026 1056 1086 1116 1146 

YLMRYSGVLISGFLGGIGGAVYAQSISTOFAATTILGPGFISLAAMIFGKWNPIGAMLASLFFGLSQSLAVIGSHLPLLS 

' I III l|:|ll = :|| = ll 111 :| = = = l =11 I I I I = I I I = = I I I I = I I I I = I = I I I I = = I I I = = I I I 111 = 
YKMRYIGVMISGLFGGLGGGvYASTIALDFTHSTISGQGFIAIAALVFGKWHPIGALGAALFFGFAQSLSIIGSLLPLFK 
210 220 230 240 250 260 270 

1176 1206 1236 1266 1296 1326 1356 1386 

NIPTOTLQIAPYVLTIIVIAAFFGQAVAPKADGINYIKTK*IIORN*YKLVSFYCL*ICEKILCENFT*IIIQ*Q*NIKK* 

=11 11= =111=111= I I 1=1 1111=1= III I 
DIPNVYMLMAPYILTILALTGFIGRADAPKANGVPYIKGKR 
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290 300 310 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1519 

A DNA sequence (GBSxl609) was identified- in S.agalactiae <SEQ ID 4673> which encodes the amino 
acid sequence <SEQ ID 4674>. This protein is predicted to be ribose/galactose ABC transporter, permease 
protein (rbsC-1). Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




•14. 


.59 


Transmembrane 


205 


- 221 


( 200 


- 228) 


INTEGRAL 


Likelihood 




■13. 


,69 


Transmembrane 


21 


- 37 


( 13 


- 45) 


INTEGRAL 


Likelihood 




-7. 


.27 


Transmembrane 


302 


- 318 


( 290 


- 321) 


INTEGRAL 


Likelihood 




-7. 


.17 


Transmembrane 


115 


- 131 


( 111 


- 138) 


INTEGRAL 


Likelihood 




-4 


,25 


Transmembrane 


251 


- 267 


( 250 


- 268) 


INTEGRAL 


Likelihood 




-2. 


,97 


Transmembrane 


63 


- 79 


( ,63 


- 80) 


INTEGRAL 


Likelihood 




-2. 


,87 


Transmembrane 


333 


- 349 


( 328 


- 349) 



Certainty=0 . 6838 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 883 1> which encodes amino acid sequence <SEQ ID 8832> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
SRCFLG : 0 

McG: Length of UR: 24 

Peak Value of UR: 3.06 

Net Charge of CR: 3 
McG: Discrim Score: 12.53 
GvH: Signal Score (-7.5): -5.31 

Possible site: 46 
>» Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 7 value: -14.59 threshold: 0.0 
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Likelihood 
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Transmembrane 
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- 212 
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191 


- 219) 


INTEGRAL 


Likelihood 




13. 


.69 


Transmembrane 


12 


- 28 
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4 


- 36) 


INTEGRAL 


Likelihood 




-7, 


,27 


Transmembrane 


293 


- 309 


( 


281 


- 312) 


INTEGRAL 


Likelihood 




-7 


.17 


Transmembrane 


106 


- 122 


( 


102 


- 129) 


INTEGRAL 


Likelihood 




-4, 


.25 


Transmembrane 


242 


- 258 


( 241 


- 259) 


INTEGRAL 


Likelihood 




-2. 


.97 


Transmembrane 


54 


- 70 


( 


54 


- 71) 


INTEGRAL 


Likelihood 




-2. 


.87 


Transmembrane 


324 


- 340 


( 


319 


- 340) 


PERIPHERAL 


Likelihood 




0 


.16 


133 













modified ALOM score : 3.42 
icml HYPID: 7 CFP: 0.684 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 6838 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15145 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 154/349 (44%) , Positives = 220/349 (62%) , Gaps = 6/349 (1%) 

Query: 10 MSKKAQKIAVPLISWLGIILGAIIMLIFGYDPLWGYEGLFQTAFGSIKNIGEIFRAMGP 69 



Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 
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M K+ + VPLI+++LG+ GA+IML+ GY GY L+ FG I +GE R + P 
Sbjct: 1 MVKRLSHLLVPLIAIILGLAAGALIMLVSGYSVASGYSALTOGIFGEIYYVGETIRQITP 60 

Query: 70 LILIALGFSVASRAGFFNIGLPGQALSGWIAAGWFALSHPDMPRPiiMILCTIIIGIVAGG 129 

IL L + A R G FNIG+ GQLGWAAW + DP + +1 AGG 
Sbjct: 61 YILSGLAVAFAFRTGLFNIGVEGQLDVGWTAAVWVGTAF-DGPAYIHLPLALITAAAAGG 119 

Query: 130 ITGAIPGILRAYLGTSEVIVTIMMNYIVLYSGNRIVQRVFPKSIMRTSDSSVYVSANASY 189 

+ G IPGIL+A EVIVTIMMNYI L+NI+V D+ + +AS 

Sbjct: 120 LWGFIPGILKARFYVHEVIVTIMMNYIALHMTNYIISNVLTDH QDKTGKIHESASL 175 

Query: 190 QTDWLSSLTNNSRINIGIFIAIIAVVLWFLLNKTTLGFEIRSVGLNPNASEYAGMSAKR 249 

++ +L +T+ SR+++GI +A++A V++WF++NK+T GFE+R+VG N +AS+YAGMS ++ 
Sbjct: 176 RSPFLEQITDYSRLHLGI IVALLAAVIMWFIINKSTKGFELRAVGFNQHASQYAGMSVRK 235 

Query: 250 TIILSMI1SGAFAGLGGWEGLGTFENVFVQPSSIAIGFDGMAVSLLAANSPIGILFAAF 309 

1+ SM+ 1 SGAFAGL G +EGLGTFE V+ + +GFDG+AV+LL N+ +G++ AA 
Sbjct: 236 NIMTSMLISGAFAGLAGAMEGLGTFEYAAVKGAFTGVGFDGIAVALLGGNTAVGWLAAC 295 

Query: 310 LFGVLSVGAPGMNI-AGIPPELIKWTASIIFFVGVHYIIEYVIKPKKQ 357 

LGL +GA M I +G+P E++ +V A II FV Y I +V+ K+ 
Sbjct: 296 LLGGLKIGALNMPIESGVPSEWDIVIAIIILFVASSYAIRFVMGKLKK 344 

A related DNA sequence was identified in S. pyogenes <SEQ ID 2149> which encodes the amino acid 
sequence <SEQ ID 2150>. Analysis of this protein sequence reveals the following: 



Possible site: 39 

>>> Seems to have an uncleavable N- 
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Transmembrane 
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Transmembrane 
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( 
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(• 
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Final Results 

bacterial membrane -• 
bacterial outside -• 
bacterial cytoplasm -• 



- Certainty=0 . 6095 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 293/358 (81%) , Positives = 333/358 (92%) , Gaps = 1/358 (0%) 

Query: 6 RRREMSKKAQKIAVPLISWLGIILGAIIMLIFGYDPLWGYEGLFQTAFGSIKNIGEIFR 65 

RR+ MSK AQKIAVPLISV+LG +LGAI IM+I FGYDP+WGYEGLFQ AFGS+KNIGEIFR 
Sbjct: 6 RRKVMSKNAQKIAVPLISVLLGFLLGAIIIWIFGYDPIWGYEGLFQIAFGSVKNIGEIFR 65 

Query: 66 AMGPLILIALGFSVASRAGFFNIGLPGQALSGWIAAGWFALSHPDMPRPAMILCTIIIGI 125 

+MGPLILIALGF+VASRAGFFN+GL GQAL+GWI+AGWFAL +PDMPRP +IL T +IG+ 
Sbjct: 66 SMGPLILIALGFTVASRAGFFNVGLSGQALAGWISAGWFALLNPDMPRPLLILMTALIGM 125 

Query: 126 VAGGITGAIPGILRAYLGTSEVIVTIMMNYIVLYSGNAIVQRVFPKSIMRTSDSSVYVSA 185 

+AGGI GAIPGILRAYLGTSEVIVTIMMNYI+LY GNAIVQR +P+S+ ++ DS++ VS 
Sbjct: 126 IAGGIAGAIPGILRAYLGTSEVIVTIMMNYIILYVGNAIVQRGYPESVKQSIDSTIQVSD 185 

Query: 186 NASYQTDWLSSLTNNSRINIGIFIAIIAVvLvWFLLNKTTLGFEIRSVGLNPNASEYAGM 245 

NASYQT WLS+LTNNSRINIGIF AIIA+ L+WFLLNKTTLGFEIRSVGLNP+ASEYAGM 
Sbjct: 186 NASYQTHWLSALTNNSRINIGIFFAIIAIALIWFLLNKTTLGFEIRSVGLNPHASEYAGM 245 

Query: 246 SAKRTIILSMIISGAFAGLGGWEGLGTFENVFVQPSSLAIGFDGMAVSLLAANSPIGIL 305 

S+KRTIILSMIISGA AGLGGWEGLGTFENVFVQ SSLA+GFDGMAVSLLAANSP+GI 
Sbjct: 246 SSKRTIILSMIISGALAGLGGVVEGLGTBmVFVQGSSIAVGFDGMAVSLLAANSPLGIF 305 



Query: 306 FAAFLFGVLSVGAPGMNIAGIPPELIKVVTASIIFFVGVHYIIE-YVIKPKKQMKGGK 362 
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F++FLFGVL++GAPGMNIAGIPPEL+KWTASIIFFVG HY+IE Y+I+PKK +KGGK 
Sbjct: 306 FSSFLFGVLNIGiAPG^IAGIPPELVICVVTASIIFFVGSHYLIERyilRPKiCLVKGGK 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1520 

A DNA sequence (GBSxl610) was identified in S.agalactiae <SEQ ID 4675> which encodes the amino 
acid sequence <SEQ ID 4676>. This protein is predicted to be sugar ABC transporter, ATP-binding protein 
(mglA). Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3851 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9803> which encodes amino acid sequence <SEQ ID 9804> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAB15144 GB:Z99120 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 311/497 (62%) , Positives = 396/497 (79%) , Gaps = 1/497 (0%) 



Query: 


14 


VIEMKEITKKFGDFVANDHINLTVEKGEIHALLGENGAGKSTLMNMLAGLLEPTDGQIFI 
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VIEM I K F VAND+INL V+KGEIHALLGENGAGKSTLMN+L GL +P G+I + 




Sb j ct : 


4 


VIEMmiRKAFPGIVANDNINLQVKKGEIHALLGENGAGKSTLMNVLFGLYQPERGEIRV 


63 


Query: 


74 


NGQPVTIDSPSKSSQLGIGIWHQHFMLVEAFTVAENIVLGNETTQNGVIjDIKTAAKEIKE 


133 






G+ V I+SP+K++ LGIGMVHQHFMLV+ FTVAENI+LG E + G +D K A +E+++ 




Sbjct: 


64 


RGEKVHINSPNKANDLGIGMVHQHFMLVDTFTVAENIILGKEPKKFGRIDRKRAGQEVQD 


123 


Query: 


134 


LSEKYGLSVNPNAKISDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPSEIKELMTIM 


193 






+S++YGL ++P AK +DISVG QQR EILKTLYRGADILIFDEPTAVLTP EIKELM IM 




Sb j ct : 


124 


ISDRYGLQIHPEAKAADISVGMQQRAEILKTLYRGADILIFDEPTAVLTPHEIKELMQIM 


183 


Query: 


194 


KSLVKEGKSIILITHKLDEIRAVADKVTVIRRGKSIETVPVAGASSQQLAEMMVGRSVSF 


253 






K+LVKEGKSIILITHKL EI + D+VTVIR+GK I+T+ V + +LA +MVGR VSF 




Sb j ct : 


184 


KNLVKEGKS 1 1 LI THKLKE IME I CDR VTVTRKGKGI KTLD VRDTNQDELASLMVGREVS F 


243 


Query: 


254 


RTEKKEANPTDIILSVKDLVVEENRGGVIAVKNLSLDVRAGEIVGIAGIDGNGQSELIQA 


313 






+TEK+ A P +L++ + V++ R G+ V++LSL V+AGEIVGIAG+DGNGQSELI+A 




Sbjct: 


244 


KTEKRAAQPGAEVIAIDGITVKDTR-GIETVRDLSLSVKAGEIVGIAGVDGNGQSELIEA 


302 


Query: 


314 


ITGLRKVTSGQIVIKGKDVTKFSSRQITELSVGHTOEDRHRDGLVLDMTMAENLALQTYY 


373 






+TGLRK SG I + GK + + R+ITE +GH+P+DRH+ GLVLD + EN+ LQ+YY 




Sb j ct : 


303 


VTGLRKTDSGTITLNGKQIQNLTPRKITESGIGHIPQDRHKHGLVLDFPIGENILLQSYY 


362 


Query: 


374 


KEPLSHKGILNFAKIKEYARQLMTEFDWGAGEHVIiARGFSGGNQQKAIIAREvDRDPDL 


433 






K+P S G+L+ ++ + AR L+TE+DVR E+ AR SGGNQQKAII RE+DR+PDL 




Sb j ct : 


363 


KKPYSALGVLHKGEMYKKARSLITEYDVRTPDEYTHARALSGGNQQKAIIGREIDRNPDL 


422 


Query: 


434 


LIVSQPTRGLDVGAIEYIHKRLIEERDKGKAVLWSFELDEILNLSDRIAVIHDGKIQGI 


493 






LI +QPTRGLDVGAIE++HK+LIE+RD GKAVL++SFEL+EI+NLSDRIAVI +G+I 




Sbjct: 


423 


LIAAQPTRGLDVGAIEFVHKKLIEQRDAGKAVLLLSFELEEIMNLSDRIAVIFEGRIIAS 


482 


Query: 


494 


VKPDQTNKQELGILMAG 510 








V P +T +QELG+LMAG 
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Sbjct: 483 WPQETTEQELGLLMAG 499 
Identities = 75/242 (30%) , Positives = 128/242 (51%) , Gaps = 24/242 (9%) 



Query: 280 GVLAVKNLSLDVRAGEIVGIAGIDGNGQSELIQAITGLRKVTSGQIVIKGKDVTKFSSRQ 339 

G++A N++L V+ GEI + G +G G+S L+ + GL + G+I ++G+ V S + 
Sbjct: 16 GIVANDNimQVKKGEIHaLLGENGAGKSTIaMNVLFGLYQPERGEIRTOGEKVHINSPNK 75 

i 

Query: 340 ITELSVGHVPEDRHRDGLVLD - MTMAENLALQTYYKEPLSHKG I IiNFAKI - - KEYARQLM 396 

+L +G V H+ +++D T+AEN+ L KEP F +1 K +++ 

Sbjct: 76 ANDLGIGMV HQHFMLVDTFTVAENIILG KEPKK FGRIDRKRAGQEVQ 122 

Query: 397 TEFDVRGAGEHVLARG--FSGGNQQKAIIAREVDRDPDLLIVSQPTRGL DVGAIEYI 451 

D G H A+ S G QQ+A I + + R D+LI +PT L ++ + I 
Sbjct: 123 DISDRYGLQIHPEAKAADISVGMQQRAEILKTLYRGADILIFDEPTAVLTPHEIKELMQI 182 

, Query: 452 HKRLIEERDKGKAVLWSFELDEILNLSDRIAVIHDGKIQGIVKPDQTNKQELGILMAGG 511 
K L++E GK++++++ +L EI+ + DR+ VI GK + TN+ EL IM G 
Sbjct: 183 MKNLVKE GKSIILITHKLKEIMEICDRvTVIRKGKGIKTLDVRDTNQDELASLMVGR 239 

Query: 512 KI 513 
++ 

Sbjct: 240 EV 241 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4677> which encodes the amino 
sequence <SEQ ID 4678>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3558 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside' Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 431/511 (84%) , Positives = 467/511 (91%) , Gaps = 1/511 (0%) 

Query: 10 MTQNVIEMKEITKKFGDFVANDHINLTVEKGEIHALLGENGAGKSTLMNMLAGLLEPTDG 69 

MTQ+VIEM+EITKKFGDFVANDHINL V KGEIHALLGENGAGKSTLMNMLAGLLEPT G 
Sbjct: 7 MTQHVIEMREITKKFGDFVANDHINLNVRKGEIHALLGENGAGKSTLMNMLAGLIjEPTSG 66 



Query: 70 QIFINGQPVTIDSPSKSSQLGIGMVHQHFMLVEAFTVAENIVLGNETTQNGVLDIKTAAK 129 

+ 1 IN +PV IDSPSKS++LGIGMVHQHFMLVEAFTVAENI+LGNE +NG LD+ A+K 
Sbjct: 67 EIVINDKPVQIDSPSKSAKLGIGMVHQHFMLVEAFTVAENIILGNEVVKNGCLDLNQASK 126 

Query: 130 EIKELSEKYGLSVNPNAKISDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPSEIKEL 189 

+IK LSEKYGIj++NP+AK+SDISVGAQQRVEIIiKTLYRGADIIiIFDEPTAVLTP+EIKEIj 
Sbjct: 127 DIKVLSEKYGLAINPSAKVSDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPAEIKEL 186 

Query: 190 MTIMKSLVKEGKSIILITHKLDEIRAVADKVTVIRRGKSIETVPVAGASSQQLAEMMVGR 249 
MTIMK+LVKEGKSIILITHKLDEIRAVAD+VTVIRRGKSIETV VAGA+SQ LAEMMVGR 
, Sbjct: 187 MTIMKNLVKEGKSIILITHKLDEIRAVADRVTVIRRGKSIETVDVAGATSQDLAEMMVGR 246 

Query: 250 SVSFRTEKKEANPTDIILSVKDLVVEENRGGVIAVKNLSLDvRAGEIVGIAGIDGNGQSE 309 

SVSF T KK A P D++LS+K+L V+ENR GV AVK LSLDVRAGEIVGIAGIDGNGQSE 
Sbjct: 247 SVSFTTSKKAAEPKDVVLSIKNLEVDENR-GVPAVKGLSLDVRAGEIVGIAGIDGNGQSE 305 

Query: 310 LIQAITGLRKVTSGQIVIKGKDVTKFSSRQITELSVGHVPEDRHRDGLVLDMTMAENLAIj 369 

LIQAITGLRKV SG I+IK +VT SSR+ITELSVGHVPEDRHRDGL+LD+++AEN AL 
Sbjct: 306 LIQAITGLRKVKSGSIMIKNNEVTHLSSRKITELSVGHVPEDRHRDGLILDLSLAENTAL 365 



Query: 370 QTYYKEPLSHKGILNFAKIKEYARQLMTEFDTOGAGEHVLARGFSGGNQQKAIIAREVDR 429 

QTYYK+PLS GILN+ KI +YARQLM EFDVRGA E V ARGFSGGNQQKAI IAREVDR 
Sbjct: 366 QTYYKQPLSQNGILNYTKINDYARQLMKEFDVRGANELVPARGFSGGNQQKAIIAREVDR 425 



Query: 



430 DPDLLIVSQPTRGLDVGAIEYIHKRLIEERDKGKAVLWSFELDEILNLSDRIAVIHDGK 489 
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DPDLLIVSQPTRGLDVGAIEYIHKRLI+ERDKGKAVLWSFELDEILNLSDRIAVIHDGK 
Sbjct: 426 DPDLLIVSQPTRGLDVGAIEYIHKRLIKERDKGKAVLWSFELDEILNLSDRIAVIHDGK 485 

Query: 490 IQGIWPDQTNKQELGILMAGGKIEKEERDV 520 
5 IQGIV P+ TNKQELGILMAGG I KEE V 

Sbjct: 486 I QGI VS PENTNKQELGI LMAGGS IHKEEGHV 516 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1521 

A DNA sequence (GBSxl612) was identified in S.agalactiae <SEQ ID 4679> which encodes the amino 
acid sequence <SEQ ID 4680>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> May be a lipoprotein 

15 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

. >GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 
[Bacillus subtilis] 
Identities = 164/335 (48%) , Positives = 224/335 (65%) , Gaps = 10/335 (2%) 

25 

Query: 18 LAACGHRGAS KSGGKS - DSLKVAMVTDTGGVDDKSFNQSGWEGMQAWGKKNGLKKGA - GF 75 

L ACG+ S G+ + VAMVTD GGVDDKSFNQS WEG+QA+GK+NGLKKG G+ 
Sbjct: 11 LGACGNSEKSSGSGEGKNKFSVAMVTDVGGVDDKS FNQSAWEGI QAFGKENGLKKGKNGY 70 

30 Query: 76 DYFQSASESDYATNLDTAVSSGYKLIFGIGFSLHDAIDKAADNNKDVNYVIVDDVIKGKD 135 

DY QS S++DY TNL+ + LI+G+G+ + D+I + AD K+ N+ I+D V+ KD 

Sbjct: 71 DYLQSKSDADYTTNIiNKIjARENFDLI YGVGYLMEDS I SEIADQRKNTNFAI IDAWD - KD 129 

Query: 136 NVASVVFADNESAYLAGIAAAKTTKTKTVGFVGGMESEVITRFEKGFEAGVKSVDKSIKI 195 
35 NVAS+ F + E ++L G+AAA ++K+ +GFVGGMESE+ 1 +FE GF AGV++V+ + 

Sbjct: 130 NVASITFKEQEGSFLVGVAAALSSKSGKIGFVGGMESELIKKFEVGFRAGVQAVNPKAW 189 

Query: 196 KVDYAGSFGDAAKGKTIAAAQYASGADIWQVAGGTGAGVFSEAKSRNESLKEADKVWVL 255 
+V YAG F A GK A + Y SG D++Y AG TG GVF+EAK+ + + D VWV+ 
40 Sbjct: 190 EVKYAGGFDKADVGKATAES^KSGVDVIYHSAGATGTGVFTEAKNLKKEDPKRD-VWVI 248 

Query: 256 GVDRDQAAEGKYTSKDGKASNFVLASSIKEVGKSVELIATKTSKGKFPGGNVTTYGLKDG 315 

GVD+DQ AEG+ +G N L S +K+V VE + K S GKFPGG TYGL 
Sbjct: 249 GVDKDQYAEGQV EGTDDNVTLTSMVTQCVDTVVEDVTKKASDGKFPGGETLTYGLDQD 305 

45 

Query: 316 GVDIATT- -NLSDDAVKAIKEAKAKIISGDIKVPS 348 

GV 1+ + NLSDD +KA+ + K KII G +++P+ 
Sbjct: 306 GVGISPSKQNLSDDVIKAVDKWKKKI IDG-LEIPA 339 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 86 1> which encodes the amino acid 
sequence <SEQ ID 862>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> May be a lipoprotein 



55 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

60 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 275/351 (78%) , Positives = 312/351 (88%) , Gaps = 3/351 (0%) 

Query: 1 ^KKIAGIGIASIAVLSIMCGHRGASKSG--GKSDSLKVAMVTDTGGVDDKSFNQSGWE 58 

MNKK G+GLAS+AVLSLAACG+RGASK G GK+D LKVAMVTDTGGVDDKSFNQS WE 
Sbjct: 1 ^KKFIGLG^SVAVLSLAACGITOGASKGGASGKTD-LKVAMVTDTGGVDDKSFNQSAWE 59 

Query: 59 GMQAWGKKNGLKXGAGFDYFQSASESDYATNLDTAVSSGYKLIFGIGFSLHDAIDKAADN 118 

G+Q+WGK+ GL+KG GFDYFQS SES+YATNLDTAVS GY+LI+GIGF+L DAI KAA + 
Sbjct: 60 GLQSWGKEMGLQKGTGFDYFQSTSESEYATOLDTAVSGGYQLIYGIGFALKDAIAKAAGD 119 

Query: 119 NKDVlWIvDDVIKGKDIWASWFADNESAYLAGIAAAKTTKTKTVGFVGGMESEVITRF 178 

N+ V +VI+DD+I+GKDNVASV FAD+E+AYLAGIAAAKTTKTKTVGFVGGME VITRF 
Sbjct: 120 NEGVKFVIIDDIIEGKDNVASVTFADHEAAYIAGIAAAKTTKTKTVGFVGGMEGTVITRF 179 

Query: 179 EKGFEAGVKSVDKSIKIKVDYAGSFGDAAKGKTIAAAQYASGADIVYQVAGGTGAGVFSE 238 

EKGFEAGVKSVD + I + + KVDYAGS FGDAAKGKTIAAAQYA+GAD+ + YQ AGGTGAGVF+E 
Sbjct: 180 EKGFEAGVKS VDDT IQVKVDYAGS FGDAAKGKT IAAAQYAAGADVI YQAAGGTGAG VFNE 239 

Query: 239 AKSRNESLKEADKAAWLGVDRDQAAEGKYTSKDGKASNFVLASSIKEVGKSVELIATKTS 298 

AK+ NE EADKVWV+GVDRDQ EGKYTSKDGK +NFVLASSIKEVGK+V+LI + + 
Sbjct: 240 AKAINEKRSFJVDKVWVIGVDRDQKDEGOTSKDGKEANFVLASSIKEVGKAVQLINKQVA 299 

Query: 299 KGKFPGGNVTTYGLKDGGVDIATTNLSDDAVKAIKEAKAKIISGDIKVPSK 349 

KFPGG T YGLKDGGV+ IATTN+S +AVKAI KEAKAKI SGDIKVP K 
Sbjct: 300 DKKFPGGKTTVYGLKDGGVEIATTNVSKEAVKAIKEAKAKIKSGDIKVPEK 350 

A related DNA sequence was identified in S. pyogenes <SEQ ID 906 1> which encodes amino acid sequence 
<SEQ ID 9062>. Analysis of this protein sequence reveals the following: 
Possible site: 17 
>» May be a lipoprotein 

Final Results 

bacterial membrane Certalnty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

Score = 414 bits (1052) , Expect = e-117 

Identities = 196/347 (56%) , Positives = 253/347 (72%) , Gaps = 2/347 (0%) 

Query: 1 MNKKVMSLGLVSTALFTLGGCTNNSAKQT - - TDNSLKIAMITNQTGIDDKSFNQSAWEGIi 58 

MNKK+ +GL S A+ +L C + A ++ +SLK+AM+T+ G+DDKSFNQS WEG+ 
Sbjct: 1 Iya^IKKIAGIGLASIAVLSIAACGHRGASKSGGKSDSLKVA^IVTDTGGVX)DKSFNQSGWEGM 60 

Query: 59 QAWGKENKLEKGKGYDYFQSANESEFTTNLESAVTNGYNLVFGIGFPLHDAVEKVAANNP 118 

QAWGK+N L+KG G+DYFQSA+ES++ TNL++AV++GY L+FGIGF LHDA++K A NN 
Sbjct: 61 CAWGKKNGLKKGAGFDYFQSASESDYATNLDTAVSSGYKLIFGIGFSLHDAIDKAADNNK 120 

D ++ IVDDVIKG+ NVAS+ F+D+E+AYLAG+ VGFVGGME +V+ RFEK 

Sbjct: 121 DVNYVIVDDVIKGKDNVASWFADNESAYLAGIAAAKTTKTKTVGFVGGMESEVITRFEK 180 

Query: 179 GFEAGVKSVDDTIKVRVAYAGSFXXXXXXXXXXXXXXXEGADVIYHAAGGTGAGVFSEAK 238 

GFEAGVKSVD +IK++V YAGSF GAD++Y AGGTGAGVFSEAK 

Sbjct: 181 GFEAGVKS VDKS I KI KVDYAGSFGDAAKGKTIAAAQYASGADIVYQ VAGGTGAGVFSEAK 240 

Query: 239 SINEKRKEEDKVWVIGVDRDQSEDGKYTTKDGKSANFVLTSSIKEVGKALVKVAVKTSED 298 

S NE KE DKVWV+GVDRDQ+ +GKYT+KDGK++NFVL SSIKEVGK++ +A KTS+ 
Sbjct: 241 SRNESLKEADKVWVLGvDRDQAAEGKYTSKDGKASNFVIiASSIKEv'GKSVELIATKTSKG 300 

Query: 299 QFPGGQITTFGLKEGGVSLTTDALTQDTXXXXXXXXXXXXXGTITVP 345 

+FPGG +TT+GLK+GGV + T L+ D G I VP 

Sbjct: 301 KF PGGNVTTYGLKDGGVDIATTNLSDDAVKAI KEAKAKI ISGDIKVP 347 
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SEQ ID 4680 (GBS211) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 6; MW 40kDa). 

The GBS211-His fusion product was purified (Figure 205, lane 8) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 259A) and FACS (Figure 259B). These tests confirm 
5 that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1522 

A DNA sequence (GBSxl613) was identified in S.agalactiae <SEQ ID 4681> which encodes the amino 
10 acid sequence <SEQ ID 4682>. This protein is predicted to be cytidine deaminase (cdd). Analysis of this 
protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 2112 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9801> which encodes amino acid sequence <SEQ ID 9802> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



25 



>GP:CAB51906 GB:AJ237978 cytidine deaminase [Bacillus psychrophilus] 
Identities = 66/114 (57%) , Positives = 81/114 (70%) 

' Query: 26 KASENAWPYSKFPVGAALRTAEGKIFTGCNVENISYGLANCAERTAIFKAVSEGYKDFS 85 
KA E AYVPYSKFPVGAAL +G 1+ GCN+EN +Y + NCAERTA FKAVS+G + F 
Sbjct: 12 KAREQAYVPYSKFPVGAALLAEDGTIYHGCNIENSAYSMTNCAERTAFFKAVSDGVRSFK 71 

30 Query: 86 EIAIYGNTEEPISPCGACRQVMVEFFNKNAKVTLIAKNGKTVETTVGELLPYSF 139 

+A+ +TE P+SPCGACRQV+ EF N + V L G ETTV +LLP +F 
Sbjct: 72 ALAWADTEGPVSPCGACRQVIAEFCNGSMPVYLTNLKGDIEETTVAKLLPGAF 125 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4683> which encodes the amino acid 
35 sequence <SEQ ID 4684>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 0041 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 >GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 

[Bacillus subtilis] 

Identities = 152/339 (44%), Positives = 223/339 (64%), Gaps = 11/339 (3%) 

Query: 8 LGLVSTALFTLGGCTNN SAKQTTDNSLKIAMITNQTGIDDKSFNQSAWEGLQAWGKE 64 

50 + LV A LG C N+ S N +AM+T+ G+DDKSFNQSAWEG+QA+GKE 

Sbjct: 1 MSLVIAAGTILGACGNSEKSSGSGEGKNKFSVAMVTDVGGVDDKSFNQSAWEGIQAFGKE 60 
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Query: 65 NKLEKGK-GYDYFQSANESEFTTNLESAVTNGYHLVFGIGFPLHDAVEKVAANNPDNHFA 123 

N L+KGK GYDY QS +++++TTNL ++L++G+G+ + D++ ++A + +FA 

Sbjct: 61 NGLKKGKNGYDYLQSKSDADYTTNLNKLARENFDLIYGVGYLMEDS I SE I ADQRKNTNFA 120 

5 Query: 124 IVDDVIKGQKWASITFSDHEAAYLAGVAAAKTTKTKQVGFVGGMEGDVVKRFEKGFEAG 183 

I+D V+ + NVASITF + E ++L GVAAA ++K+ ++GFVGGME +++K+FE GF AG 
Sbjct: 121 IIDAWD-KDNVASITFKEQEGSFLVGVAAALSSKSGKIGFVGGMESELIKKFEVGFRAG 179 

Query: 184 VKSVDDTIKVRVAYAGSFADAAKGKTIAAAQYAEGADVIYHAAGGTGAGVFSEAKSIISIEK 243 
10 V++V+ V V YAG F A GK A + Y G DVIYH+AG TG GVF+EAK++ ++ 

Sbjct: 180 VQAVNPKAWEVKYAGGFDKADVGKATAESMYKSGVDVIYHSAGATGTGVFTEAKNLKKE 239 

Query: 244 RKEEDKAmVIGVDRDQSEDGKYTTKDGKSANFVLTSSIKEVGKALVKVAVKTSEDQFPGG 303 
+ D VWVIGVD+DQ +G+ +G N LTS +K+V + V K S+ +FPGG 
15 Sbjct: 240 DPKRD- vWVIGVDKDQYAEGQV EGTDDNVTLTSMVKKVDTWEDVTKKASDGKFPGG 295 

Query: 304 QITTFGLKEGGVSLTTDA- - LTQDTKKAIEAAKKAI IEG 340 

+ T+GL + GV ++ L+ D KA++ KK II+G 

Sbjct: 296 ETLTYGLDQDGVGISPSKQNLSDDVIKAVDKWKKKIIDG 334 

20 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/128 (68%) , Positives = 107/128 (82%) 

Query: 15 MGNIELKKI^VKASENAWPYSKFPVGAALRTAEGKIFTGQJVENISYGLANCAERTAIF 74 
25 MG +L AV+ASE AYVPYS FPVGAAL+T +G I+TGCN+EN+S+GL NC ERTAIF 

Sbjct: 1 MGTTDLVSCAVQASEYAYVPYSHFPVGAALKTKDGTIYTGCNIENVSFGLTNCGERTAIF 60 

Query: 75 KAVSEGYKDFSEIAIYGOTEEPISPCGACRQVMVEFFNKNAKVTLIAKNGKTVETTVGEL 134 
, KA+S+G+K+ EIAIYG T +P+SPCGACRQVM EFF+ ++ VTLIAKNG+TVE TVG+L 
30 Sbjct: 61 KAISDGHKELVEIAIYGETMQPVSPCX^CRQVMAEFFDPSSLVTLIAKNGQTVEMTVGDL 120 

Query: 135 LPYSFVDL 142 

L YSF DL 
Sbjct: 121 LLYSFTDL 128 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1523 

A DNA sequence (GBSxl614) was identified in S.agalactiae <SEQ ID 4685> which encodes the amino 
40 acid sequence <SEQ ID 4686>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N- terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2979 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9799> which encodes amino acid sequence <SEQ ID 9800> 
50 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11882 GB:Z99104 alternate gene name: ybaA-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 90/201 (44%) , Positives = 144/201 (70%) , Gaps = 5/201 (2%) 



55 



Query: 1 MANTOTTENPNvEHDIHEIOTKLLGESFSFLTDAGVFSKRMIDYGSQVLIJSrSLHF-EKNK 59 

M+ YY+E P+V+ + + +L + F+F +D+GVFSK+ +D+GS++L++S E 
Sbjct: 1 MSEHYYSEKPSVKSNKQTWSFRLRNKDFTFTSDSGVFSKKEVDFGSRLLIDSFEEPEVEG 60 
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Query: 60 SLLDLGCGYGPLGISIAK-VQGVKATMVDINTRALELAKKNATRNGVV-VEVFQSNIYEN 117 

+LD+GCGYGP+G+SLA + M+D+N RA+EL+ +NA +NG+ V+++QS+++ N 

Sbjct: 61 GILDVGCGYGPIGLSIASDFKDRTIHMIDVNERAVELSNENAEQNGITNVKIYQSDLFSN 120 

Query: 118 I--SKTFDYIISNPPIRAGKQVVHSIIEESICYLNTGGSLTIVIQKKQGAPSAKAKMLDT 175 

+ ++TF I++NPPIRAGK+WH+I E+S +L G L IVIQKKQGAPSA K+ + 
Sbjct: 121 VDSAQTFASILTNPPIRAGKKVVHAIFEKSAEHLKASGELWIVIQKKQGAPSAIEKLEEL 180 

Query: 176 FGNCDILKKDKGYYILRSEKV 196 

F +++K KGYYI++++KV 
Sbjct: 181 FDEVS WQKKKGYYI I KAKKV 201 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4687> which encodes the amino acid 
sequence <SEQ ID 4688>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

:>>> Seems to have no N-termlnal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4232 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 139/195 (71%) , Positives = 165/195 (84%) 

Query: 1 MflNKYYTFJWPNVEHDIHELNVKLLGESFSFLTDAGVFSKRMIDYGSQVLLNSLHFEKNKS 60 

M MYY ENP+ HDIHE+ V+LL F+FLTD+GVFSK+M+D+GSQVLL +L+F++N+ 
Sbjct: 12 MTKMYYDENPDSLHDIHEVKVELLNHPFTFLTDSGWSKKMVDFGSQVLLKTLNFKENER 71 

Query: 61 LLDLGCGYGPLGISIAKVQGVKATMVDINTRALELAKTOIATRNGWVE 120 

+LDLGCGYGPLGISLAKVQ V AT+VDIN RAL+LA+KNAT N V V +FQSNIYENIS 
Sbjct: 72 VLDMCGYGPLGISLAKVQRVEATLVDINNRALDIjARKN^ 131 

Query: 121 TFDYIISNPPIRAGKQWHSIIEESICYrMTGGSLTIVIQKKQGAPSAKAKMLDTFGNCD 180 

F++ 1 1 SNPPIRAGK+WHS I IE+S I +li G LTIVIQKKQGAPSAKAKM FGN + 
Sbjct: 132 HFEHIISNPPIRAGKRWHSIIEKSIDFLWNGDLTIVIQKKQGAPSAKAKMATIFGNVE 191 

Query: 181 ILKKDKGYYILRSEK 195 

IL+KDKGYY+LRS K 
Sbjct: 192 ILRKDKGYYVLRSIK 206 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1524 

A DNA sequence (GBSxl615) was identified in S.agalactiae <SEQ ID 4689> which encodes the amino 
acid sequence <SEQ ID 4690>. This protein is predicted to be pantothenate kinase (coaA). Analysis of this 
protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5021 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06594 GB:AP001516 pantothenate kinase [Bacillus halodurans] 
Identities = 140/307 (45%) , Positives = 203/307 (65%) , Gaps = 5/307 (1%) 
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Query: 


4 


EFINFDRISRENWKDLHQQSQALLTEKELESIKSIiNDNINIQDVIDIYLPLINLIQIYKR 


63 






+F + +SR WK L 4- S + Fi+EIjE 4- TiTtfj- T 4- 4-V TlTV-i-PTi T.4. j-j- 

I l T^in^l^ i"i X\. .u i kj r J— l 1 J— 1 1 1 1 j -|- JJJulT X. T v J_/ X X T Jr JLJ XJT^ T T 




Sb j ct : 


8 


DFFPYTW.flPROWK^TiPTCA^^T.PTT^nFT 7ATTT.T tnruaT 


CI 
D / 


Query: 


64 


SQENLSFSKAIFLKKENYQRPFIIGISGSVAVGKSTTSRLLQLLISRTFKDSHVELVTTD 


123 






+ + Ti K F 4- PFTTfi4-4.n^V&VnKQTTj-PT.T.n T.J- j. IT\7x.TJ7 T T ir Pn 
~ t JJ !*• jr t rJ?l HjttuD V/-i. VVji\jD 1 X +lxiJJLiy J_i+ -r rlv+LlVllLJ 




Sb j ct : 


68 


ZiVf^TST .r)(~}r > tTC'R(^lT?'I?TJV{f7l TTKTD CDT7T TiT S/^Ct 7"7\ T7i^'Tf r GT"T'7\ DT T Avt r V7l ruTncEJTTEXTTrir t rrnrpn 

jh. i yKXjyyyiu^^i? tlrttjiUNKb f £■ 1 KjLiACafa vAVGiUd I IiU<bbUKIjljKAWPEHHH VDLVTTD 


127 


Query: 


124 


GFLYPNEKLIQNGIIil^KGFPESYDMESLLNFLDTIKNGIT-AKIPIYSHEiyDlVPNQL 


182 






GFLYPNE L fi4-4-4-4-KGFPF, c IYn4- +Ti+ FTi j-Tf n Tf Pj.vqu Va-TV 




Sb j ct : 


128 


r^FT YPT\TFT l T 1 FZiPftlT.M'nK"'tr(^lP , DTi 1 QVriT DTiT.TDWT Cr»T7T/"7\/^ , C 1 'D\n 7V7\ TlXTV CUT mv7vTTTJTP/^r\V 


187 


Query: 


183 


QTIETPDFLILEGINVFQ-NQQNHRL YMNDYFDFS I YIDAENKQIEEWYLQRFNSLIi 


238 






O 4- PD 4- T 4- FnTNV O TvT4.4-ftTtf j- j-j. a.Tla.'Pn'CG T Vj.n& j. lflT -lTaTVj. j_D t? T. 




Sb j ct : 


188 


0\A7TTFPT^TVTT7TJ(*IITT\n7T /T17MTn?TvTIJUTDT>.n7T7WCnT7 PntTGT V^7T17\ ttccatt /~\TaTVT^dt?vt t r> 
yv vriQirxjx v x vrAjxxN vi-iy viNisxu.viririx fiM vx? voUr r LJc o J. x V JJjfu\Jijiyj.xjyw x xtixr iAJjljy 


247 


Query: 


239 


QIiAEADPSNYYHKFTQIPPHKAMELAKDIWKTINLVl^EKYIEPTRNRADFIIHKGKHHK 


298 






7\ T"\Tj j lVj.Uj. T? i i 7V ■ 7V TTaTV T\T t7TiTT ■ T FIT 1 1 i O T\ , , Vt~t TTTT 




OJJJ t-L. . 


9AQ 
o 


JN 1 A£ ylJirJNo jl HKt Kntifab VbAhiQ t Al S± WKNINGiVrnjHENILPTKHRADLVLQKGPHHF 


307 


Query: 


299 


IDEIYLK 305 








IDE+ L+ 




Sbjct: 


308 


IDEVKLR 314 





A related DNA sequence was identified in S.pyogenes <SEQ ID 469 1> which encodes the amino acid 
sequence <SEQ ID 4692>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4790 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 219/306 (71%) ,, Positives = 269/306 (87%) 



Query: 


1 


MNNEFINFDRISRENWKDLHQQSQALLTEKELESIKSLNDNINIQDVIDIYLPLINLIQI 


60 






M+NEFINF++1SRE+WK LHQ+++ALLT++EL+SI SLNDNI + I DVIDIYLPLINLIQ+ 




Sb j ct : 


1 


MSNEFINFEKlSRESWKTLHQKAKALLTQEELKSITSIiNDNISINDVIDIYLPLlNLIQV 


60 


Query: 


61 


YKRSQENLSFSKAIFLKKENYQRPFIIGISGSVAVGKSTTSRLLQLLISRTFKDSHVELV 


120 






YK +QENLSFSK++FLKK+ RPFIIGISGSVAVGKSTTSRLLQLL+SRT +S VELV 




Sbj ct : 


61 


YKIAQENLSFSKSLFLKKDIQLRPFIIGISGSVAVGKSTTSRLLQLLLSRTHPNSQVELV 


120 


Query: 


121 


TTDGFLYPNEKLIQNGII^KGFPESYDIVIESLIjNFLDTIKNGITAKIPIYSHEIYDIVPN 


180 






TTDGFLYPN+ LI+ G+LNRKGFPESY+ME LL+FLD+IKNG TA P+YSH+IYDI+PN 




Sbj ct : 


121 


TTDGFLYPNQFLIEQGLLNRKGFPESYNMELLLDFLDSIKNGQTAFAPVYSHDIYDIIPN 


180 


Query: 


181 


QLQTIETPDFLILEGIOTFQNQQNHRLYMNDYFDFSIYIDAENKQIEEWYLQRFNSLLQL 


240 






Q Q+ PDFLI+EGINVFQNQQN+RLYM+DYFDFSIYIDA++ IE WY++RF S+L+L 




Sbjct: 


181 


QKQSFtWPDFLIvEGIjWFQNQQNNRLYMSDYFDFSIYIDADSSHIETWYIERFLSILKL 


240 


Query: 


241 


AEADPSjNYYHKFTQIPPHIU^EIiAKDIWKTINLVNLEKYIEPTRNRADFIIHKGKHHK^ 


300 






A+ DP NYY ++ Q+P +A+ A+++WKT+NL NLEK+IEPTRNRA+ I+HK HKID 




Sbj ct : 


241 


AKRDPHNYYAQYAQLPRSEAIAFARNWKTVmFJfLEKFIEPTRNRAELILHKSADHKID 


300 


Query: 


301 


EIYLKK 306 








EIYLKK 




Sbj ct : 


301 


EIYLKK 306 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1525 

A DNA sequence (GBSxl616) was identified in S.agalactiae <SEQ ID 4693> which encodes the amino 
acid sequence <SEQ ID 4694>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
5 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3866 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05058 GB:AP001511 ribosomal protein S20 (BS20) [Bacillus halodurans] 
Identities = 47/86 (54%) , Positives = 59/86 (67%) , Gaps = 7/86 (8%) 

15 

Query: 3 VKTLANIKSAIKRAELNVKQNEKNSAQKSAMRTAIKAFEA NPSEELYRA ASSS 55 

+K ANIKSAIKR + N K+ +N++ KSA+RTAIK FEA N E +A A+ 
Sbjct: 1 MKGNANIKSAIKRWTNEKRRIQNASWSALRTAIKQFEAKVENNDAEAAKAAFVFATKK 60 

20 Query: 56 IDKAASKGLIHTNKASRDKARLATKL 81 

+DKAA+KGLIH N ASR K+RLA KL 
Sbjct: 61 LDKAANKGL I HKNAASRQKSRLAKKL 86 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4695> which encodes the amino acid 
25 sequence <SEQ ID 4696>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 3872 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 76/82 (92%) , Positives = 78/82 (94%) 

Query: 1 MEVKTLANIKSAIKRAELNVKQNEKNSAQKSAMRTAIKAFEANPSEELYRAASSSIDKAA 60 

+EVKTLANIKSAIKRAELNVK NEKNSAQKSAMRTAIKAFEANPSEEL+RAASSSIDKA 
Sbjct: 1 LEVKTLANIKSAIKRAELNVKANEKNSAQKSAMRTAIKAFEANPSEELFRAASSSIDKAE 60 

40 

Query: 61 SKGLIHTNKASRDKARLATKLG 82 

SKGLIH NKASRDKARLA KLG 
Sbjct: 61 SKGLIHKNKASRDKARIiAAKLG 82 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1526 

A DNA sequence (GBSxl617) was identified in S.agalactiae <SEQ ID 4697> which encodes the amino 
acid sequence <SEQ ID 469 8>. Analysis of this protein sequence reveals the following: 

50 Possible site: 48 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.99 Transmembrane 31 - 47 ( 25 - 51) 

Final Results 

55 bacterial membrane Certainty=0. 53 94 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35851 GB:AF086736 amino acid-binding protein Abp 
[Streptococcus uberis] 
Identities = 169/269 (62%), Positives = 203/269 (74%), Gaps = 2/269 (0%) 



Query: 


29 


KMILLTIIFGLFMIILSACGMSNKEMAGIDNWEHYQKEKKITIGFDNTFVPMGFESRSGD 


88 






K ILLT + + L ACG S+ A D W+ Y+KEK IT+GFDNTFVPMGF+ SG 




Sb j ct : 


4 


KKILLTTIAI,ftSTLFLVACGKSSA--AKTDOWDTYKKEKSITLGFDNTFVPMGFKDESGK 


61 


Query : 


89 


YTGFDIDIANAVFKEYGISVKWQPINWDMKETEIiNNGNIDLITOGYSKTAERAKKVAFTN 


148 






TGFD++LA AVF+EYGI VK+QPINWD+KETEL NG ID+IWNGYS T ER KVAF+ 




Sbjct: 


62 


NTGFDVEIAKAVFQEYGIKVKFQPINWDLKETELKNGKIDMIWNGYSOTKERQAKVAFST 


121 


Query: 


149 


PYMNNHQVI vTKTSSHINSIKDMKGKKLGAQSGSSGFDAFT^AKPDILKKFVKGKEAVQYD 


208 






PYM N QV+VTK SS+I S MKGK LGAQSGSSG+DAF + P +LK VK +A QY+ 




Sbj ct : 


122 


PYMKNEQVLVTKKSSNITSFAAMKGKVLGAQSGSSGYDAFTSNPKVLKDIVKDNDATQYE 


181 


Query: 


209 


TFTQALIDLKNNRIDGLLIDEVYAlJYYLKQEGNIKAYYFVKTAYQGENFvVGARKVDRRL 


268 






TF QA IDLKN+RIDGLLID+VYANYYLKQEG + Y VK+ + GE+F VG RK D+ L 




Sbjct: 


182 


TFIQAFIDLKNDRIDGLLIDKVYANYYLKQEGELTNYNIVKSEFDGEDFAVGVRKEDKIL 


241 


Query: 


269 


I EKINKAFKQLHNKGRFQKI S YKWFGEDV 297 








++ IN AF +L+ G+FQ+IS KWFGEDV 




Sbj ct : 


242 


LKNINSAFTKLYKTGKFQEISQKWFGEDV 270 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4699> which encodes the amino acid 
sequence <SEQ ID 4700>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC35851 GB:AF086736 amino acid-binding protein Abp 
[Streptococcus uberis] 
Identities = 176/277 (63%) , Positives = 220/277 (78%) , Gaps = 1/277 (0%) 



Query: 


1 


MIIKKRTVAILAIASSFFLVACQATKSLKSGDAWGVYQKQKSITVGFDNTFVPMGYKDES 


60 






M +KK + LA+AS+ FLVAC + + K+ D W Y+K+KSIT+GFDNTFVPMG+KDES 




Sbjct: 


1 


mLKKILLTTIALASTLFLVACGKSSAftKT-DQWDTYKKEKSITLGFDNTFVPMGFKDES 


59 


Query: 


61 


GRCKGFDIDIAKEVFHQYGLKWFQAINWDMKFAELNNGKIDVIWNGYSITKERQDKVAF 


120 






G+ GFD++LAK VF +YG+KV FQ INWD+KE EL NGKID+IWNGYS+TKERQ KVAF 




Sbj ct : 


60 


GKOTGFDVEIAKAVFQEYGIKVKFQPINWDIjKETELKNGKIDMIWNGYSvTKERQAKVAF 


119 


Query: 


121 


TDSYMRNEQIIWKKRSDIKTISDMKHKVLGAQSASSGYDSLLRTPKLLKDFIKNKDANQ 


180 






+ YM+NEQ++V KK S+I + + MK KVLGAQS SSGYD+ PK+LKD +K+ DA Q 




Sbjct: 


120 


STPYMKNEQVLOTKKSSNITSFAAMKGKVLGAQSGSSGYDAFTSNPKVLKDIVKDNDATQ 


179 


Query: 


181 


YETFTQAFIDLKSDRIDGILIDKVYANYYLAKEGQLENYRMIPTTFENEAFSVGLRKEDK 


240 






YETF QAFIDLK+DRIDG+LIDKVYANYYL +EG+L NY ++ + F+ E F+VG+RKEDK 




Sbj ct : 


180 


YETFIQAFIDLKNDRIDGLLIDKVYANYYLKQEGELTNYNIVKSEFDGEDFAVGVRKEDK 


239 


Query: 


241 


TLQAKINRAFRVLYQNGKFQAISEKWFGDDVATANIK 277 








L IN AF LY+ GKFQ IS+KWFG+DVAT N+K 




Sbj ct : 


240 


ILLKNINSAFTKLYKTGKFQEISQKWFGEDVATENVK 276 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 151/266 (56%) , Positives = 189/266 (70%) , Gaps = 4/266 (1%) 

Query: 32 LLTIIFGLFMIILSACGMSNKE^GIDNWEHYQKEKKITIGFDNTFVPMGFESRSGDYTG 91 
5 +L I F++ AC + K + D W YQK+K IT+GFDNTFVPMG++ SG G 

Sbjct: 10 ILAIASSFFLV AC-QATKSLKSGDAWGVYQKQKSITOGFDNTFVPMGYKDESGRCKG 65 

Query: 92 FDIDIANAVFKEYGISVKWQPINWDMKETELNNGNIDLIVMGYSKTAERAKKVAFTNPYM 151 
FDIDLA VF +YG+ V +Q INWDMKE ELNNG ID+IWNGYS T ER KVAFT+ YM 
10 Sbjct: 66 FDIDLAKEVFHQYGLKVNFQAINWDMKEAELNNGKIDVIWNGYSITKERQDKVAFTDSYM 125 

Query: 152 MNHQVIVTKTSSHINSIKDMKGKKLGAQSGSSGFDAFNAKPDILKKFVKGKEAVQYDTFT 211 

N Q+IV K S I +1 DMK K LGAQS SSG+D+ P +LK F+K K+A QY+TFT 
Sbjct: 126 RNEQIIWKKRSDIKTISDMKHKVLGAQSASSGYDSLLRTPKLLKDFIKNKDANQYETFT 185 

15 

Query: 212 QALIDLKNNRIDGLLIDEVYANYYLKQEGNIKAYYFVKTAYQGENFWGARKVDRRLIEK 271 

QA IDLK++RIDG+LID+VYANYYL +EG ++ Y + T ++ E F VG RK D+ L K 
Sbjct: 186 QAFIDLKSDRIDGILIDKVYANYYLAKEGQLENYRMIPTTFENEAFSVGLRKEDKTLQAK 245 

20 Query: 272 INKAFKQLHNKGRFQKI S YKWFGEDV 297 

IN+AF+ L+ G+FQ IS KWFG+DV 
Sbjct: 246 INRAFRVLYQNGKFQAI SEKWFGDDV 271 

A related GBS gene <SEQ ID 8833> and protein <SEQ ID 8834> were also identified. Analysis of this 
25 protein sequence reveals die following: 

Lipop Possible site: 22 Crend: 4 

Sequence Pattern: CGMS 
SRCFLG: 0 

McG: Length of UR: 22 
30 Peak Value of UR: 3.05 

Net Charge of CR: 2 
McG: Discrim Score: 11.16 
GvH: Signal Score (-7.5): -1.96 
Possible site: 24 
35 »> May be a lipoprotein 

Amino Acid Composition: calculated from 23 
ALOM program count: 0 value: 8.96 threshold: 0.0 
PERIPHERAL Likelihood =8.96 68 
modified ALOM score: -2.29 



40 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

62.2/75.8% over 270aa 
50 Streptococcus uberis 

GP| 3603430 | amino acid-binding protein Abp Insert characterized 

ORF00904(385 - 1203 of 1503) 

GP|3603430|gb|ARC35851.l| |AF086736 (4 - 274 of 277) amino acid-binding protein Abp 
55 {Streptococcus uberis} 

%Match =34.8 

%Identity =62.1 %Similarity =75.7 

Matches = 169 Mismatches = 65 Conservative Sub.s = 37 

60 153 183 213 243 ' 273 303 333 363 

FHYLGGKSNVSH*LTO**LIHRLLVMMSQLALLIQSCVKK*KN*FYKIEKQV*HKL**HMIFNLLKVYLIRFSKLILSRL 



393 423 453 483 513 543 573 603 

GGRLLTHKNILLTIIFGLFMIILSACGMSNKEMAGIDNWEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVF 



WO 02/34771 



PCT/GB01/04789 



-1695- 

: I llll : : I III h I h hill 11=11111111111= II llll==ll III 

MNLKKILLTTLALASTLFLVACGKSS - -AAKTDQWDTYKKEKSITLGFDNTFVPMGFKDESGKNTGFDVEIAKRVF 
10 20 30 40 50 60 70 

633 663 693 723 753 783 813 843 

KEYGISVKWQPIOTroMKETELNNGNIDLIWNGYSKTAERAKKVAFTO 

= im ii = iiiiii = iiiii ii ihiinii i ii mi-- iii i 11 = 111 ii--i i iiii mill 

QEYGIKVKFQPINWDLKETELKNGKIDMITOGYSOTKERQ 

90 100 110 120 130 140 150 

873 903 933 963 993 1023 1053 1083 

SSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLK^ 

111=111 = I =11 II =1 11=11 ll=llll|:|lllllll=lllllllllll =111== 11=1 II 
SSGYDAFTSNPKmKDIVKDNDATQYETFIQAFIDLKMJRIDGLLIDKOTMTY^LKQEGELmraiVKSEFDGEDFAVGV 
170 180 190 200 210 220 230 

1113 1143 1173 1203 1233 1263 1293 1323 

RKTORRLIEKINKAFKQLHNKGRFQKISYKWFGEDWSKE*KTRNFS*SFILRKH*IKNIDISDVF*VN*PSLVARRALS 

ii i= i = = ii ii =1= 1 = 11 = 11 mim == i 

RKEDKI LLKNINSAFTKLYKTGKFQE I SQKWFGEDVATENVKK 
250 260 270 

SEQ ID 8834 (GBS225) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 10; MW 32kDa). The GBS225-His fusion product was purified (Figure 
205, lane 7) and used to immunise mice. The resulting antiserum was used for FACS (Figure 266), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1527 

A DNA sequence (GBSxl618) was identified in S.agalactiae <SEQ ID 4701> which encodes the amino 
acid sequence <SEQ ID 4702>. This protein is predicted to be arginine ABC transporter, ATP-binding 
protein (glnQ). Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3229 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB49429 GB:U73111 glutamine transport ATP-binding protein GLNQ 
[Salmonella typhimurium] 
Identities = 94/210 (44%) , Positives = 146/210 (68%) , Gaps = 3/210 (1%) 

Query: 1 MLELKNISKCYGQKEIFKDFNLTVEEGKILSLVGPSGGGKTTLLRMLAGLEKIDSGTIVH 60 

M+E KN+SK +G ++ + +L + +G+++ ++GPSG GK+TLLR + LE+I SG ++ 
Sbjct: 1 MIEFKNVSKHFGPTQVLHNIDLNIRQGEvWIIGPSGSGKSTLLRCINKLEEITSGDLIV 60 

Query: 61 DGKEVS VDHLETLNLLGFVFQDFQLFPHLTVLDNLILSPVTCTMGLSKEIJUCEKALVL 117 

DG +V+ VD G VFQ F LFPHLT L+N++ P++ G+ KE A+++A L 

Sbjct: 61 DGLKVNDPKVDERLIRQFAGMVFQQFYLFPHLTALENVMFGPLRTOGVKKEEAEKQAKAL 120 

Query: 118 LERLGLKDHALVYPFSLSGGQKQRVALARAMMIDPQIIGYDEPTSALDPELRQEVEKLIL 177 

L ++GL + A YP LSGGQ+QRVA+ARA+ + P+++ +DEPTSALDPELR EV K++ 
Sbjct: 121 LAKVGIAERAHHYPSELSGGQQQRVAIARAIAVKPKMMLFDEPTSALDPELRHEVIjKVMQ 180 

Query: 178 QNRETGMTQI WTHDLQFAES I SDTI LKIN 207 
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E GMT ++VTH++ FAE ++ ++ 1+ 
Sbjct: 181 DLAEEGMTMVIVTHEIGFAEKVASRLIFID 210 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4703> which encodes the amino acid 
5 sequence <SEQ ID 4704>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 .2146 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 164/209 (78%), Positives = 183/209 (87%) 

Query: 1 MLELKNISKCYGQKEIFKDFNLTVEEGKILSLVGPSGGGKTTLLRMLAGLEKIDSGTIVH 60 

MLELKNISK +GQK IF FNLTV+ +G+ +LSLVGPS GGKTTLLRMLAGLE IDSG + + 
Sbjct: 1 MLELKNISKQFGQKTIFDGFNLTVQDGEVLSLVGPSSGGKTTLLRMLAGLESIDSGQVFY 60 

20 

Query: 61 DGKEVSVDHLETLNLLGFVFQDFQLFPHLTVLDNLILSPVKTMGLSKEIAKEKALVLLER 120 

+G++V +DHLE NLLGFVFQDFQLFPHLTVLDNL LSP TMG K AKEKAL LL R 
Sbjct: 61 NGEDVGIDHLENRNLLGFVFQDFQLFPHLTVLDNLTLSPTITMGKQKADAKEKALDLLAR 120 

25 Query: 121 LGLKDHALWPFSLSGGQKQRVALARAMMIDPQIIGYDEPTSALDPELRQEVEKLILQNR 180 

LGLK+HA VYP+SLSGGQKQRVALARAMMIDPQI IGYDEPTSALDPELRQ VE LI+QNR 
Sbjct: 121 LGLKEHAQWPYSLSGGQKQRVAIARAMMIDPQIIGYDEPTSALDPELRQTVEALIVQNR 180 

Query: 181 ETGMTQIWTHDLQFAESISDTILKINPK 209 
30 E G+TQIWTHDL FAE+ISD I+++NPK 

Sbjct: 181 EMGITQIWTHDLVFAEAISDRIIRVNPK 209 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1528 

A DNA sequence (GBSxl619) was identified in S.agalactiae <SEQ ID 4705> which encodes the amino 
acid sequence <SEQ ID 4706>. This protein is predicted to be amino acid ABC transporter, permease 
protein (glnP). Analysis of this protein sequence reveals the following: 

Possible site: 16 
40 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.12 Transmembrane 102 - 118 ( 96 - 120) 

Final Results 

bacterial membrane Certainty=0. 4248 (Affirmative) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9341> which encodes amino acid sequence <SEQ ID 9342> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA98402 GB:AP002545 ABC amino acid transporter permease 
[Chlamydophila pneumoniae J138] 
Identities = 55/127 (43%) , Positives = 83/127 (65%) , Gaps = 1/127 (0%) 

55 Query: 3 AAI IAFTMNYAAYFAE I FRGGIES I PKGQYFAWOTLKFSKFQTVWXT VLPQVFKI VLPSV 62 

A IIA +MN AAY AE RGGI S+ GQ+E+A VL + K+Q YI+ PQVFK +LPS+ 
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Sbjct: 89 AGIIALS^SAAyLAENIRGGINSLSIGQWESflMVLGYKKYQIFVYIIYPQVFKNILPSL 148 

Query: 63 FNETITLVKDSSLVYILGVGDLLLESKTARMRDATLAPMF-IAC3GIYLLLIGLLTILSKQ 121 

NE ++L+K+SS++ ++GV +L +K +R+ M+ I G+Y L+ + +S+ 

Sbjct: 149 TI^FVSLIKESSILMWGVPELTKVTKDIVSREIiNPMEMYLICAGLYFLMTSSFSCISRL 208 

Query: 122 VEKRFNY 128 

EKR +Y 
Sbjct: 209 SEKRRSY 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4707> which encodes the amino acid 
sequence <SEQ ID 4708>. Analysis of this protein sequence reveals the following: 

Possible _site: 34 
>>> Seems to have no N-terminal signal sequence 
15 INTEGRAL Likelihood =-11.57 Transmembrane 21 - 37 ( 7 - 44) 

INTEGRAL Likelihood =-10.93 Transmembrane 185 - 201 ( 178 - 206) 
INTEGRAL Likelihood = -3.29 Transmembrane 63 - 79 ( 62 - 81) 

Final Results 

20 bacterial membrane Certainty=0 . 5628 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

25 >GP:BAB05181 GB:AP001512 ABC transporter (permease) [Bacillus halodurans] 

Identities = 86/206 (41%) , Positives = 126/206 (60%) , Gaps = 1/206 (0%) 

Query: 4 ICjQ^PSLLIXSMvTLQVFFIVIILSIPIjGAII^LMKIPFKPLQWFLTLYVWMMRGTPL 63 
IQ +P +L+G VTLQ + ++ + LG +LA ++ +WF Y + RGTPL 

30 Sbjct: 8 IQPFMPFMLEGVWVTLQFVSVSLLFGLVLGIVLAIFKISKYRLFRWFADFYTSIFRGTPL 67 

Query: 64 LLQLI FFYYVLPSVG I S FDRMPAAILAFTLNYAAYFAEIFRGGIEAI PKEQYEAAKVLKL 123 

+LQL+ Y LP G+ + AA LAF LN AAY +EI R GI+A+ KGQ EAA+ L + 
Sbjct: 68 ILQLLMIYLALPQFGVDISQFQAAFLAFGLNSAAYVSEIIRAGIQAVDKGQREAAEALGI 127 

35 

Query: 124 KPLQTIRYIILPQVFKIVLPSVFNEVINLVKDSSLVYVLGVGDLL-LASKTAANRDATLA 182 

+ IILPQ + +LP++FNE INL K+S++V V+GV DL+ A T+A L 
Sbjct: 128 PYRPMMLRIILPQAMRNILPALFNEFINLTKESAIVSVIGVTDLMRRAQITSAETYIjYLE 187 

40 Query: 183 PMFIAGLIYLLLIGLVTIISKQVEKR 208 

P+ GLIY +L+ +T+I + +E+R 
Sbjct: 188 PLLFVGLIYYVLVMGLTVIGRLLERR 213 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 112/130 (86%) , Positives = 121/130 (92%) 

Query: 1 MPAAI IAFTMNYAAYFAEIFRGGIESIPKGQYEAAKVLKFSKFQTVWYIVLPQVFKIVLP 60 

MPAAI +AFT+NYAAYFAEI FRGGIE+ I PKGQYEAAKVLK QT+ YI+LPQVFKIVLP 
Sbjct: 84 MPAAI IAFTIjNYAAYFAEIFRGGIEAIPKGQYEAAKVLKLKPLQTIRYIILPQVFKIVLP 143 

50 

Query: 61 SVFNETITLVKDSSLVYILGVGDLLLESKTAANRDATLAPMFIAGGIYLLLIGLLTILSK 120 

SVFNE I LVKDSSLVY+LGVGDLLL SKTAANRDATLAPMFIAG I YLLLIGL+TI +SK 
Sbjct: 144 SVBTSJEVINLvTOSSLVYVLGVGDLLLASKTAANRDATLAPMFIAGLIYLLLIGLVTIISK 203 

55 Query: 121 QVEKRFNYYK 130 

QVEKRFNYY+ 
Sbjct: 204 QVEKRFNYYQ 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 1529 

A DNA sequence (GBSxl620) was identified in S.agalactiae <SEQ ID 4709> which encodes the amino 
acid sequence <SEQ ID 471 0>. This protein is predicted to be minidiscs. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.66 Transmembrane 44 - 60 ( 39 - 66) 

INTEGRAL Likelihood = -7.96 Transmembrane 129 - 145 ( 123 - 147) 

INTEGRAL Likelihood = -5.15 Transmembrane 13 - 29 ( 9 - 33) 

10 INTEGRAL Likelihood = -2.39 Transmembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane Certainty=0. 4864 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF49688 GB:AE003532 mnd gene product [Drosophila melanogaster] 
Identities = 48/145 (33%) , Positives = 78/145 (53%) , Gaps = 8/145 (5%) 

20 

Query: 7 IKQTYGLMTTIAMIVGWIGSGIYFKVDDILKFTGGDVFLGMVILVLGSFSIVFGSLSIS 66 

+K+ GL+ +A+IVGV++GSGI + +LKF+ G + +++ VL + G+L + 

Sbjct: 39 LKKQIGLLDGVAI IVGVTVGSGI FVSPKG VLKFS -GS IGQSLI VWVLSGVLSMVGALCYA 97 

25 Query: 67 EIiAIRTSESGGIFSYYEKYVSPALAATLGLFASFLYL-PTLTAIVSWVAAFYTLGE 121 

EL +SGG ++Y P L A L L+ + L L PT AI + A Y L 

Sbjct: 98 ELGTMIPKSGGDYAYIGTAFGP-LPAFLYLWVALLILVPTGNAITALTFAIYLLKPFWPS 156 

Query: 122 -SSSLESQIILAAVYILALSLMNIF 145 
30 + +E+ +LAA I L+L+N + 

Sbjct: 157 CDAPIEAVQLLAAAMICVLTLINCY 181 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1530 

A DNA sequence (GBSxl621) was identified in S.agalactiae <SEQ ID 471 1> which encodes the amino 
acid sequence <SEQ ID 4712>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
40 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 1531 

A DNA sequence (GBSxl622) was identified in S.agalactiae <SEQ ID 4713> which encodes the amino 
acid sequence <SEQ ID 4714>. This protein is predicted to be TRK potassium uptake system protein. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 232 - 248 ( 232 - 248) 

Final Results 

10 bacterial membrane Certainty=0. 1022 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A, related GBS nucleic acid sequence <SEQ ID 8835> which encodes amino acid sequence <SEQ ID 8836> 
15 was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: -4.65 
GvH: Signal Score (-7.5) : -3.64 
Possible site: 27 
20 >>> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -0.06 threshold: 0.0 

INTEGRAL Likelihood = -0.06 Transmembrane 228 - 244 ( 228 - 244) 
PERIPHERAL Likelihood = 1.27 428 
modified ALOM score: 0.51 ; 



25 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 1022 (Affirmative) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90401 GB:AE001046 TRK potassium uptake system protein 
35 (trkA-2) [Archaeoglobus fulgidus] 

Identities' = 136/446 (30%) , Positives = 238/446 (52%) . Gaps = 12/446 (2%) 

Query: 5 MRIIWGGGKVGTALCRSLVAEKHDVVLIEKKENVLKRVTKQHDIMGIVGNGANYKILEQ 64 
MRI++ G G+VG L SL A HDV++IEK + +RV++ D++ I GN AN K+D 
40 Sbjct: 1 MRIVIAGAGEVGYHLAMSL-APNHDVIIIEKDVSRFERVSEL-DWAINGNAANMKVLRD 58 

Query: 65 AEVKNCDIFIAITDRDEVNMISAVLAKKMGAKETVVRMRNPEYSNPYFKDKNFLGFSSVV 124 

A V+ D+F+A+T DEVN++S + AKK+GAK +VR+ NPEY + ++ LG+ ++ 

Sbjct: 59 AGWRADVFIAOTGNDEWLLSGLAAKKVGAKNVITOVENPEYVDRPIVKEHPLGYDVLI 118 

45 

Query: 125 NPELLAAQYIANTIEFPNATSVEHFANGRVMLMEFKILEGNKLCHTSMSQIRKKFGNIVI 184 

P+L AQ A I P A V F+ G+V ++E +++EG+K +++ + N+VI 
Sbjct: 119 CPQLSIAQEAARLIGIPGAIEWTFSGGKVEMIELQVMEGSKADGKAIADLYLP-QNWI 177 

50 Query: 185 CAIERDGKLIIPDGDATIQVKDKIFVTGNRIEMILFHNYVKNKVVKNLMVIGAGRIAYYL 244 

+1 R+G + IP GD ++ D++ + ++ + V + + + GAG I Y 

Sbjct: 178 ASIYRNGHIEIPRGDTVLRAGDRVAIVSKTEDVEMLKGIFGPPVTRRVTIFGAGTIGSYT 237 

Query: 245 LNILKNTNTHVKLVELNQEQAEYFSQEFPNvPVVHGDGTAKNILLEESVTSFDAVATLTG 304 
55 IL T VKL+E + E+ E S E V+VGDT L+EE + DAV T 

Sbjct: 238 AKILAKGMTSvKLIESSMERCEALSGELEGVRIVCGDATDIEFLIEEEIGKSDAVLAATE 297 

Query: 305 vDEENIITSMFLESIGIPKNITKVNRTSLLEIIDDKQLSSIITPKRIAVDHVMHFVRGRV 364 
DE+N++ S+ +++G I KV + +++ + + + P+ + + V +R 
60 Sbjct: 298 SDEKNLLISLLSKNLGARIAIAKVEKEEYVKLFEAVGvDVALNPRSVTYNEVSKLLR 354 



Query: 365 NAQDSNLEAMHHIANDRIETLQFEIKETSKLANRSLASLKLKQNILIAAIIRNNKTIFPT 424 
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+E + I + + + ++L ++L L L ++ +1 AI+R N+ + P 




Sbjct: 


355 


TMRIETLAEIEGTAWEV- - -VvRNTRLVGKALKDLPLPKDAIIGAIVRGNECLIPR 


408 


Query: 


425 


GEDVLTVGDRIWITLLKNITRTSDM 450 








G+ + DR++V I + ++ 




Sb j ct : 


409 


GDTTIEYEDRLLVFAKWDEIEKIEEI 434 




Identities = 48/212 (22%) , Positives = 99/212 (46%) , Gaps = 15/212 (7%) 




Query: 


3 


VK^IIWGGGKVGTALCRSLVAEKHDWLIEKKENVLKRVTKQHDIMGIV-GNGANYKI 


61 






V R+ + G G +G+ + L V LIE +++ + + + IV G+ + + 




Sbjct: 


221 


VTRRVTIFGAGTIGSYTAKILAKGMTSVKLIESSMERCEALSGELEGVRIVCGDATDIEF 


280 


Query: 


62 


LEQAEVKNCDIFIAITDRDEVmiSAVLAKKMGAKETVVRMRNPEYSNPYFKDKNFLGFS 


121 






L + E+ D +A T+ DE N++ ++L+K +GA+ + ++ EY + +G 




Sb j ct : 


281 


LIEEEIGKSDAVLAATESDEKNLLISLLSKNLGARIAIAKVEKREYVKLF EAVGVD 


336 


Query: 


122 


SWNPELLAAQYIA- - -NTIEFPNATS VEHFANGRVMLMEFKILEGNKLCHTSMSQIRKK 


178 






+NP + ++ T+ +E A V++ +++ G L + + 




Sb j ct : 


337 


VAIjNPRSOTYNEVSKLLRTMRIETIAEIEGTAVVEWVRNTRLV-GKALKDLPLPK 


391 


Query: 


179 


FGNI VI CAIERDGKLI I PDGDATI QVKDKI FV 210 








+ +1 AI R + +IP GD TI+ +D++ V 




Sbjct: 


392 


- -DAIIGAIVRGNECLIPRGDTTIEYEDRLIiV 421 





25 There is also homology to SEQ ID 4716. 

SEQ ID 8836 (GBS384) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 2; MW 53kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 6; MW 78kDa). 

The GBS384-GST fusion product was purified (Figure 212, lane 9) and used to immunise mice. The 
30 resulting antiserum was used for FACS (Figure 279), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1532 

35 A DNA sequence (GBSxl623) was identified in S.agalactiae <SEQ ID 4717> which encodes the amino 
acid sequence <SEQ ID 4718>. Analysis of this protein sequence reveals the following: 

Possible site: 52 ' 
>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4948 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1533 

A DNA sequence (GBSxl624) was identified in S.agalactiae <SEQ ID 4719> which encodes the amino 
acid sequence <SEQ ID 4720>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have an uncleavable N-term signal seg 



INTEGRAL 


Likelihood 




■12, 


.58 


Transmembrane 


37 


- 53 


( 


33 


- 61) 


INTEGRAL 


Likelihood 




•11. 


.57 


Transmembrane 


183 


- 199 


( 


179 


- 214) 


INTEGRAL 


Likelihood 




•10. 


,03 


Transmembrane 


397 


- 413 


( 


392 


- 424) 


INTEGRAL 


Likelihood 




-6, 


.79 


Transmembrane 


14 


- 30 


( 


5 


- 31) 


INTEGRAL 


Likelihood 




-6, 


,42 


Transmembrane 


71 


- 87 


( 


69 


- 93) 


INTEGRAL 


Likelihood 




-4, 


.99 


Transmembrane 


278 


- 294 


( 


274 


- 295) 


INTEGRAL 


Likelihood 




-4. 


.19 


Transmembrane 


133 


- 149 


( 


132 


- 152) 


INTEGRAL 


Likelihood 




-4. 


.09 


Transmembrane 


327 


- 343 


( 


324 


- 344) 


INTEGRAL 


Likelihood 




-2 


.44 


Transmembrane 


236 


- 252 


( 


234 


- 252) 


INTEGRAL 


Likelihood 




-0 


.59 


Transmembrane 


456 


- 472 


( 


456 


- 472) 



Final Results 

bacterial membrane Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10065> which encodes amino acid sequence <SEQ ID 
10066> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90400 GB:AE001046 TRK potassium uptake system protein (trkH) 
[Archaeoglobus fulgidus] 
Identities = 166/480 (34%) , Positives = 262/480 (54%) , Gaps = 10/480 (2%) 



Query: 


1 


MNKSMIRFLLSKLLLIEAALIiAIPLTVGLIYREP-QSVMMSIVITMIILIILGLIiGSLFK 


59 






MN + +L KLL++ + +PL ++ EP ++ +++++ +LG G + 




Sb j ct : 


1 


MNLRLTASILGKLLMLFSFSFILPLIAAHVFEEPYHPFLIPAALSLLVGAVLGY-GIKTE 


59 


Query: 


60 


PKNYHIYTKEGMLIVALCWILWSFFGALPFVISGQIPNIIDAFFEVSSGFTTTGATILDD 


119 






+ + KE IVAL W+ S FG++P++I G P +DAFFE SGFTTTGA++L 




Sb j ct : 


60 


SEFDSLRHKESFAIVALIWLFMSIFGSIPYIIFGISP- -VDAFFESMSGFTTTGASVLTP 


117 


Query: 


120 


VSVLSPALLFWRSFTHLIGGMGVLVFALAIMENSKNSHLEVMRAEVPGPVFGKVVSKLKK 


179 






L +LL WRS T IGGMG++V LAI N + +AE PG K+ +++ 




Sbjct: 


118 


EE-LPKSLLLVmSLTQWIGGMGIIVLFIiAIFPNVAKRSTVLFQAEYPGVSLSKLKPRIRD 


176 


Query: 


180 


TAQILYLLYLLMFAVFAVILYFAGMPFFDSIIIAMGTAGTGGFAVYNDSIAHYNSPLITN 


239 






TA LY +YLL+ +LY G+ FD+I T TGG++ +++SIA + + 




Sb j ct : 


177 


TALSLYKVYLLLTIAEVALLYALGLSLFDAINHTFTTLSTGGYSTHSESIAFFKDVRVEA 


236 


Query: 


240 


LVSIGMLIFGVNFNLYYLLLLRKIKAFFGDEELKTYLRIVAIATFMIALNVIGMYDNFRQ 


299 






+V+ +GNFLYLL K F + E + Y+ +A+A+ +IA + Y F + 




Sb j ct : 


237 


WAFFAFLGGANFALI YFLLSGK- PVI FRNTE FRAYVCFLALAS WIAAVNLDRYS I F - E 


294 


Query: 


300 


GLEHIFFEVSAIITTTGFGVTDITRWPLFSQVILLFLMFIGGSAGSTAGGFKVMRSLILA 


359 






L + F+ +I+TTTGF D W +++IL+ LMFIGGS+GST GG KV+R +L 




Sbj ct: 


295 


SLRYSIFQAVSIMTTTGFTTADFDAWSDSAKLILWLMFIGGSSGSTGGGIKVIRIYLLI 


354 


Query: 


360 


KIARNQVLSTLYPNRvMSLHINKSVLDKNTQHGVLKYLTIYMIFMALvLVLTLDTNDFL 


419 






K A +Q+L P V ++ + K + + +Y+ IF ++++L D + 




Sbj ct : 


355 


K^AWQILRAAEPRTVRAVKFEGRAIKKEILDDIAAFFVLYILIFAVSSILVSLSGYDIV 


414 


Query: 


420 


WISAAASCFNNIGP LLGSNETFSFFSPFSKLLLSFAMIAGRLEIYPVLLMFIPKTW 


476 






ISA A+ N+GP L G+ E ++ F +K+LL+ M GRLEI+ V+ +FIP W 




Sbj ct : 


415 


TSISATAATLGNVGPGLGIAGAAENYASFPSLTKILLAVNMWIGRLEIFTVVSLFIPTFW 


474 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1534 

A DNA sequence (GBSxl625) was identified in S.agalactiae <SEQ ID 4721> which encodes the amino 
5 acid sequence <SEQ ID 4722>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence (or aa 1-20) 

Final Results 

10 bacterial cytoplasm Certainty=0. 2870 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAD36530 GB:AE001797 conserved hypothetical protein 

[Thermotoga maritima] 
Identities = 43/75 (57%), Positives = 57/75 (75%), Gaps = 1/75 (1%) 

Query: 2 LKSFLIFLTOFYQKNISPAFPASCRYRPTCSTYMIEAIQKHG-LKGVLMGIARILRCHPL 60 
20 +K LI L+RFYQ+ ISP P +CR+ PTCS Y I+A++KHG LKG +G+ RILRC+PL 

Sbjct: 1 MKKLLIMLIRFYQRYISPLKPPTCRFTPTCSNYFIQALEKHGLLKGTFLGLRRILRCNPL 60 

' Query: 61 AHGGNDPVPDHFSLR 75 
+ GG DPVP+ FS + 
25 Sbjct: 61 SKGGYDPVPEEFSFK 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4723> which encodes the amino acid 
sequence <SEQ ID 4724>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3639 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 53/78 (67%) , Positives = 60/78 (75%) 

40 Query: 1 MLKSFLIFLVRFYQKNISPAFPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPL 60 

M+K LI V+ YQK ISP P SCRY+PTCS YM+ AI+KHG KG+LMGIARILRCHP 
Sbjct: 1 MMKKLLIVSVKAYQKYISPLSPPSCRYKPTCSAYMLTAIEKHGTKGILMGIARILRCHPF 60 

Query: 61 AHGGNDPVPDHFSLRRNK 78 
45 GG DPVP+ FSL RNK 

Sbjct: 61 VAGGVDPVPEDFSLMRNK 78 

SEQ ID 4722 (GBS233) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 58 (lane 3; MW 35.6kDa). 

50 The GBS233-GST fusion product was purified (Figure 207, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 280), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1535 

A DNA sequence (GBSxl626) was identified in S.agalactiae <SEQ ID 4725> which encodes the amino 
acid sequence <SEQ ID 4726>. This protein is predicted to be ribosomal large subunit pseudouridine 
synthase B (rluB). Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2957 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05295 GB:AP001512 pseudouridylate synthase [Bacillus halodurans] 
Identities = 130/239 (54%) , Positives = 175/239 (72%) , Gaps = 2/239 (0%) 

RINKyiAHAGIASRRKAEELIKQGMVTINGQVVNEIATQVKAG-DLVEIEGSPIYNEEKV' 60 
R+ K IA AGIASRRKAE+LI +G V +NGQW EL +V D +E+EG P+ EE V 
RLQKVI AQAGI ASRRKAEQL I LEGKVKVNGQWKELG I KVNPNQDDI E VEGVPVEKEEPV 62 

YYLLNKPRGVISSVSDDKGRKTVIDLLPQVKERIYPVGRLDWDTTGLLILTNDGDFTDKM 120 
Y+LL KP GVISSV DDKGRK V D L ++++R+YPVGRLD+DT+GLL+LTNDG+F + + 



+HPR++ 1 +KVY+A+VKGI T++ L+ L RGV ++ T PA+ ++ VD K ++V+LT 



IHEGRN QV++MFE +G V KL R QF UDL+G+ PG+ R L E+ L A+ K 
IHEGRNRQWRMFETIGCEVMKLKREQFAFLDLSGMNPGDVRPLKPIEVKHLRELAVTK 240 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4727> which encodes the amino acid 
sequence <SEQ ID 4728>. Analysis of this protein sequence reveals the following: 
Possible site: 18 

>>> Seems to have no N-terminal signal sequence 



Query: 


2 


Sbjct: 


3 


Query: 


61 


Sbjct: 


63 


Query: 


121 


Sb j ct : 


122 


Query: 


181 


Sb j ct : 


182 



Final Results 

bacterial cytoplasm Certainty=0 . 1587 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/239 (87%) , Positives = 228/239 (94%) 



MRINKYIAHAGIASRRKAEELIKQG+VT+NGQV+ +LAT VK+GD+VEIEGSPIYNEEKV 



YYLLNKPRG ISSVSDDKGRKTV+DLLPQVKERIYPVGRLDWDT+G+LILTNDGDFTD M 



IHPRNEIDKVYLARVKGIATKENLRPLTRG+VIDGKKTKPARY I++V+ +K+RS+VELT 



IHEGRNHQVKKMFE VGLLVDKLSRT+FGT+DJj GLRPGEARRLNKKE I SQLHN A K 



Query: 


1 


Sb j Ct : 


9 


Query: 


61 


Sb j ct : 


69 


Query: 


121 


Sbjct: 


129 


Query: 


181 


Sb j ct : 


189 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1536 

A DNA sequence (GBSxl627) was identified in S.agalactiae <SEQ ID 4729> which encodes the amino 
5 acid sequence <SEQ ID 473 0>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-termlnal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1476 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:BAB05280 GB:AP001512 unknown conserved protein [Bacillus halodurans] 

Identities = 75/180 (41%) , Positives = 107/180 (58%) , Gaps = 10/180 (5%) 

Query: 6 SIEALLWAGEDGLSLRQMAELLSLTPSALIQQLEKIAKRYEEDDDSSLLLLETAQTYKL 65 
+ IE +LFV G++G++L ++ +LL L+ + LE+L Y D+ L + E A ++L 
20 Sbjct: 9 AIEGILFVRGDEGVTLEELCDLLELSTDWYAALEELRLSYT-DEARGLRIEEVAHAFRL 67 



25 



Query: 66 VTKDSYMTLIjRDYAKAPINQSLSRASLEVLSIIAYKQPITRIEIDDIRGVNSSGAITRLI 125 

TK + A + + LS+A+LE L+IIAY+QPITRIE+D++RGV S AI L 

Sbjct: 68 STKPE1APYFKKLALSTLQSGLSQAALETLAIIAYRQPITRIEVDEVRGVKSEKAIQTLT 127 

Query: 126 AFGLIKEAGKKEVLGRPNLYEITOYFLDYMGINQLDDL IDASSIELVDEEVSLF 179 

+ LIKE G+ + GRP LY TT FLD+ G+ L +L ID SSI EE LF 
Sbjct: 128 SRLLIKEVGRAQGTGRPILYG1TPQFLDHFGLKSLKELPPLPEDIDESSI GEEADLF 184 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 473 1> which encodes the amino acid 
sequence <SEQ ID 4732>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 1062 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/179 (72%) , Positives = 159/179 (88%) 

Query: 1 MTYLGSIEALLFVAGEDGLSLRQMAELLSLTPSALIQQLEKLAKRYEEDDDSSLLLLETA 60 
MTYL IEALLFVAGE+GLSLR +A +LSLTP+AL QQLEKL+++YE+D SSL L+ETA 
45 Sbjct: 1 MTYLSQIEALLFVAGEEGLSLRHLASMLSLTPTALQQQLEKLSQKYEKDQHSSLCLIETA 60 

1 Query: 61 QTYKLVTKDSYMTLLRDYAKAPINQSLSRASLEVLSIIAYKQPITRIEIDDIRGvNSSGA 120 
TY+LVTK+ + LLR YAK P+NQSLSRASLEVLSI+AYKQPITRIEIDDIRGVNSSGA 
Sbjct: 61 NTYRLvTKEGFAELLRAYAKTPMNQSLSRASLEVLSIVAYKQPITRIEIDDIRGVNSSGA 120 



50 



Query: 121 ITRLIAFGLIKEAGKKEVLGRPNLYETTNYFLDYMGINQLDDLIDASSIELVDEEVSLF 179 

+++L+AF LI+EAGKK+V+GRP+LY TT+YFLDYMGIN LD+LI+ S++E DEE++LF 
Sbjct: 121 LSKLLAFDLIREAGKKDVVGRPHLYATTDYFLDYMGINHLDELIEVSAVEPADEEIALF 179 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1537 

A DNA sequence (GBSxl628) was identified in S.agalactiae <SEQ ID 4733> which encodes the amino 
acid sequence <SEQ ID 4734>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1012 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14254 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 89/246 (36%) , Positives = 145/246 (58%) , Gaps = 19/246 (7%) 

Query: 3 IKLKDFEGPLDLLLHLVSKYEVDIYDVPIVEVIEQYLAYIATLQAMRLEVAGEYMLMASQ 62 

+K+ FEGPLDLLLHL+++ E+DIYD+P+ ++ EQYL Y+ T++ + L++A EY++MA+ 
Sbjct: 6 VKIDTFEGPLDLLLHLINRLEIDIYDIPVAKITEQYLLYVHTMRVLELDIASEYLVMAAT 65 

Query: 63 LMLIKSRNLLPK WESNP I - EDDPEMELLSQLEEYRRFKVLSEELANQHQERAKYF 117 

L+ IKSR LLPK + E + E+DP EL+ +L EYR++K +++L + +ER K F 
Sbjct: 66 LLSIKSFMLLPKQEEELFEDELLEEEDPREELIEKLIEYRKYKDAAKDLKEREEERQKSF 125 

Query: 118 SKPKQEVIFEDAILLHDKSVMDLFLTFSQMMSQKQKELSNS QTVIEKEDYRIED 171 

+KP ++ + +S L +T M+ QK L +T I ++D IE 

Sbjct: 126 TKPPSDL- -SEYAKEVKQSEQKLSVTVYDMIGAFQKVLKRKKINRPMETTITRQDIPIEA 183 

Query: 172 MMIVIERHFNLKKKTT---LQEVFADCQTKSEMITLFLAMLELIKLHQITVEQDSNFSQV 228 

M I +LK + T ++F + K ++ FLA+LEL+K + +EQ+ NFS + 

Sbjct: 184 RMNEIVH--SLJCSRGTRINFl^LF-PYEQKEHLvVTFIJWLELMKNQLVLIEQEHNFSDI 240 

Query: 229 ILRKEE 234 
+ E 

Sbjct: 241 YITGSE 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4735> which encodes the amino acid 
sequence <SEQ ID 4736>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.61 Transmembrane 199 - 215 ( 199 - 218) 



Final Results 

bacterial membrane Certainty=0. 2444 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB14254 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 86/239 (35%) , Positives = 145/239 (59%) , Gaps = 15/239 (6%) 

Query: ,3 IKLKDFEGPLDLLLHLVSQYKVDIYEVPIVEVIEQYLNYIETLQVMKLEVAGDYMLMASQ 62 

+K+ FEGPLDLLLHL+++ ++DIY++P+ ++ EQYL Y+ T++V++L++A +Y++MA+ 
Sbjct: 6 VKIDTFEGPLDLLLHLINRLEIDIYDIPVAKITEQYLLYVHTMRVLELDIASEYLVMAAT 65 

Query: 63 LMLIKSRRLLPKWEHI EEDLEQDLLEKIEEYSRFKAVSQALAKQHDQRAKWY 115 

L+ IKSR LLPK E + EED ++L+EK+ EY ++K ++ L ++ ++R K + 

Sbjct: 66 LLSIKSRMLLPKQEEELFEDELLEEEDPREELIEKLIEYRKYKDARKDLKEREEERQKSF 125 

Query: 116 SKPKQELI - FEDAILQEDK TVMDLFLAFSNIMAAKRAVLKNNHTVIERDDYKIEDM 170 

+KP +L + + Q ++ TV D+ AF ++ K+ + + T I R D IE 
Sbjct: 126 TKPPSDLSEYAKEVKQSEQKLSVTVYDMIGAFQKVLKRKK- INRPMETTITRQDIPIEAR 184 
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Query: 171 MASIKQRLEKEOT-IRLSAIFEECQTLNEVISIFLASLELIKLHWFVEQLSNFGAIIL 228 

MI L+ I +F Q + V++ FLA LEL+K +V +EQ NF I + 

Sbjct: 185 MNEIVHSLKSRGTRINFMDLFPYEQKEHLVOT-FLAVLELMKNQLVLIEQEHNFSDIYI 242 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 156/235 (66%) , Positives = 191/235 (80%) , Gaps = 2/235 (0%) 

Query: 1 MDI KLKDFEGPLDLLLHLVS KYEVD I YDVPI VEVIEQYLAYIATLCAMRLE VAGEYMLMA 60 

MDI KLKDFEGPLDLLLHLVS + Y+VD I Y+VPIVEVIEQYL YI TLQ M+LEVAG+YMLMA 
Sbjct: 1 MDIKLKDFEGPLDLLLHLVSQYKVDIYEVPIVEVIEQYLNYIETLQVMKLEVAGDYMLMA 60 

Query: 61 SQLMLIKSRNLLPKWESNPIEDDPEMELLSQLEEYRRFKVLSEELANQHQERAKYFSKP 120 

SQLMLIKSR LLPKWE IE+D E +LL ++EEY RFK +S+ LA QH +RAK++SKP 
Sbjct: 61 SQLMLIKSRRLLPKWEH- - IEEDLEQDLLEKIEEYSRFKAVSQALAKQHDQRAKWYSKP 118 

Query: 121 KQEVIFEDAILLHDKSVMDLFLTFSQMMSQKQKELSNSQTVIEKEDYRIEDMMIVIERHF 180 

KQE+IFEDAIL DK+VMDLFL FS +M+ K+ L N+ TVIE++DY+IEDMM I++ 
Sbjct: 119 KQELIFEDAILQEDKTVMDLFLAFSNIMAAKRAVLKNNHTVIERDDYKIEDMMASIKQRL 178 

Query: 181 NLKKKTTLQEVFADCQTKSEMITLFLAMLELIKLHQITVEQDSNFSQVILRKEEK 235 

+ L +F +CQT +E+I++FLA LELIKLH + VEQ SNF +ILRKE+K 
Sbjct: 179 EKENVIRLSAIFEECQTLNEVISIFLASLELIKLHWFVEQLSMFGAIILRKEKK 233 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1538 

A DNA sequence (GBSxl629) was identified in S.agalactiae <SEQ ID 4737> which encodes the amino 
acid sequence <SEQ ID 4738>. This protein is predicted to be pXOl-18. Analysis of this protein sequence 
reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.14 Transmembrane 128 - 144 ( 127 - 145) 

Final Results 

bacterial membrane Certainty=0 . 2657 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05248 GB:AP001512 integrase/recombinase [Bacillus halodurans] 
Identities = 67/271 (24%) , Positives = 117/271 (42%) , Gaps = 35/271 (12%) 



Query: 


11 


LKTMINDINNFIESKK LSLNSRKSYHYDLKQFYKII GGHVNSEKLALY 


58 






++T+ N++ F+ +K LS N+ +SY DLKQ+ + + ++ E + Y 




Sb j ct : 


1 


METVNNNLQQFLHFQKVERGLSNNTIQSYGRDLKQYIQYVERVEEIRSARNITRETILHY 


60 


Query: 


59 


QQSLSEFKL- -TARKRKLSAVNQFLFFLYNRGTLKEFYRL QETEKITLAQTKSQI 


111 






L E T+ R ++A+ F FL + + T+++ AT ++ 




Sb j ct : 


61 


LYHLREQGRAETSIARAVAAIRSFHQFLLREKLSDSDPTVHVEIPKATKRLPKALTIEEV 


120 


Query: 


112 


MDLSNFYQDTDYPSGRLIALLIL--SLGLTPAEIANLKKADFDTTFNILS-IEKSQMKRI 


168 






L N Q D S R A+L L + G+ +E+ L +D + + + K +RI 




Sb j ct : 


121 


EALLNSPQGRDPFSLRNKAMLELLYATGMRVSELIGLTLSDIHLSMGFVRCLGKGNKERI 


180 


Query: 


169 


LKLPEDLLPFLLESLEEDG DLVF-EHNGKPYSRQWFFNQLTDFLNEKN-E 


216 






+ + + + +ES +G D VF H+G+P SRQ F+ L N + 




Sb j Ct : 


181 


IPIGQ-VATFAvESYIANGRGKLMKKQSHDHVFVNHHGRPLSRQGFWKMLKQLAKNVNID 


239 


Query: 


217 


QQLTAQLLREQFILKQKENGKTMTELSRLLG 247 





WO 02/34771 



PCT/GB01/04789 



-1707- 

+ LT LR F ENG + + +LG 

Sbjct: 240 KPLTPHTLRHSFATHLLFJtfGADLRAVQEMLG 270 

A related DNA sequence was identified in S.pyogenes <SEQ ID 473 9> which encodes the amino acid 
5 sequence <SEQ ID 4740>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 111 - 127 ( 110 - 127) 

10 Final Results 

bacterial membrane Certainty=0. 1362 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

1 5 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/243 (48%) , Positives = 167/243 (68%) , Gaps = 1/243 (0%) 

Query: 18 INNFIESKKLSLNSRKSYHYDLKQFYKI IGGHVNSEKLALYQQSLSEFKLTARKRKLSAV 77 
20 I FI SK LS NS+K+Y YDL+QF ++IG VN +KL LYQ S++ L+A+KRKLS 

Sbjct: 5 IEPFIASKALSQNSQKAYRYDLQQFCQLIGERVNQDKLLLYQNSIANLSLSAKKRKLSTA 64 

Query: 78 NQFLFFLYNRGTLKEFYRLQETEKITLAQTK-SQIMDLSNFYQDTDYPSGRLIALLILSL 136 
NQFL++LY L ++RL +T K+ + + + I++ FYQ T + G+LI+LLIL L 
25 Sbjct: 65 NQFLYYLYQI KYLNSYFRLTDTMKVMRTEKQQAAI INTDI FYQKTPFVWGQLI SLLILEL 124 

Query: 137 GLTPAEIANLKKADFDTTFNILSIEKSQMKRILKLPEDLLPFLLESLEEDGDLVFEHNGK 196 

GLTP+E+A ++ A+ D F +L+++ + R+L L + L+PFL + L +FEH G 

Sbjct: 125 GLTPSEVAGIEVANLDLNFQMLTLKTKKBVRVLPLSQILIPFLEQQLVGKEVYLFEHRGI 184 

30 

Query: 197 PYSRQWFFNQLTDFLNEKNEQQLTAQLLREQFILKQKENGKTMTELSRLLGLKTPITLER 256 

P+SRQWFFN L F+ + LTAQ LREQFILK+K GK++ ELS +LGLK+P+TLE+ 

Sbjct: 185 PFSRQWFFNHLKTFVRSIGYEGLTAQKLREQFILKEKLAGKSIIELSDILGLKSPMTLEK 244 

35 Query: 257 YYR 259 

YY+ 

Sbjct: 245 YYK 247 

SEQ ID 4738 (GBS383) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
40 extract is shown in Figure 68 (lane 7; MW 32kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 5; MW 57. lkDa). 

The GBS383-GST fusion product was purified (Figure 212, lane 8) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 308), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1539 

A DNA sequence (GBSxl630) was identified in S.agalactiae <SEQ ID 4741> which encodes the amino 
acid sequence <SEQ ID 4742>. Analysis of this protein sequence reveals the following: 

50 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2465 (Affirmative) < suco 



WO 02/34771 



PCT/GB01/04789 



-1708- 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05201 GB:AP001512 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 38/136 (27%) , Positives = 73/136 (52%) , Gaps = 1/136 (0%) 

Query: 7 ES FLLNHLDHYL I PAEDVAI FVDTHNADHVMLLLASNGFSRVPVITKEKKYVGT I S I SDI 66 

++ + N L +IP E VA ++ +H +IH-L +G++ +PV+ + K G IS S I 
Sbjct: 7 QNIMDNDLKELVIPFEKVAHVHLSNPLEHALLVLIKSGYTAIPVLDEHSKLHGVISKSLI 66 

Query: 67 MGYQSKGQLTDWE-MAQTDIVEMVNTKIEPINEAATLTAIMHKIVDYPFLPVISDQNDFR 125 

+ + + E +A + +++N +1 1+ A+ + + + +PF+ ++ D F 

Sbjct: 67 LDALLGVERIEMERIAHLVVKDVMNPEIPTIHHKASFSRALKVSIAHPFICILDDDGSFL 126 

Query: 126 GIITRKSILKAINSLL 141 

GI+TR +IL IN L 
Sbjct: 127 GILTRSTILSFINRQL 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4743> which encodes the amino acid 
sequence <SEQ ID 4744>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3539 (Affirmative) < suco 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 119/153 (77%) , Positives = 137/153 (88%) 

Query: 1 MIAKEFESFLLNHLDHYLIPAEDVAIFVDTHNADHVMLLLASNGFSRVPVITKEKKYVGT 60 

MIAKEFE+FL++HLD+YLIP +D+AI F+DTHNADHVMLLL SNGFSRVPVIT+EKKYVGT 
Sbjct: 1 MIAKEFETFLMSHLDNYLIPEQDLAIFIDTHNADHVMLLLVSNGFSRVPVITREKKYVGT 60 

Query: 61 ISISDIMGYQSKGQLTDWEMAQTDIVEMVNTKIEPINFAATLTAIMHKIVDYPFLPVISD 120 

ISISDIM YQSK QLTDWEM+QTDI EMVNTKIE 1+ ++LT IMHK++D+PFLPV+ 
Sbjct: 61 ISISDIMMYQSKRQLTDWEMSQTDIGEMVNTKIETISITSSLTEIMHKLIDFPFLPWDR 120 

Query: 121 QNDFRGIITRKSILKAINSLLHDFTDEYTITPK 153 

N F GIITRKSILKA+NSLLHDFTD+YTI K 
Sbjct: 121 ANRFVGIITRKSILKAVNSLLHDFTDDYTIIKK 153 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1540 

A DNA sequence (GBSxl631) was identified in S.agalactiae <SEQ ID 4745> which encodes the amino 
acid sequence <SEQ ID 4746>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4421 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06785 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 55/169 (32%), Positives = 95/169 (55%), Gaps = 1/169 (0%) 

5 Query: 5 KLVWSDSHGDRDIVKDIKiraYLGKVDAIFHNGDSELPSSDPIWEGIHVVTGNCDYDSGY 64 

KL+++SDSHG D +K + + + +VDAI H GDSELP D EG+++V GNCD+ + 
Sbjct: 2 KLLILSDSHGWSDELKA.VADKHRQEVDAIIHCGDSELPRDDRALEGMNIVRGNCDFGVDF 61 

Query: 65 PEVLVTKIDNAVIVQTHGHLHQINFTWDKLDLLAQQEDADICLYGHLHRADAWKNGKTIF 124 
10 PE + + + + THGHL+ + ++ L A++ A + +GH H A +++ +F 

Sbjct: 62 PEDFIKTOGDFNVYVTHGHLYNVKMSYVSLTYRAEEVGAQLVCFGHSHVATSFQENGIVF 121 

Query: 125 INPGSVLQPRGPINEKLYAWTITDSKVLVEYYTRQHQPYPNLTKELSR 173 
+NPGS+ PR E+ Y+ +D++++ R +L + R 

15 Sbjct: 122 VNPGSLRLPRNR-KEQTYCLAYVRDDQIELTFLDRDGHEVTDLQRTYLR 169 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4747> which encodes the amino acid 
sequence <SEQ ID 4748>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3835 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 117/173 (67%) , Positives = 143/173 (82%) 

30 Query: 1 MAI RKLWMSDSHGDRD I VKDI KNHYI/3KVDAI FHNGDSELPSSDPIWEGIHWTGNCDY 60 

MA + ++VMSDSHGDRDIV+ IK+ YLG+VDAI FHNGDSEIi SSDPIW GI+W GNCDY 
Sbjct: 1 MASKTIIVMSDSHGDRDIVQAIKDKYLGQVDAIFHNGDSELNSSDPIWAGIYWGGNCDY 60 

Query: 61 DSGYPEVLVTKIDNAVIVQTHGHLHQINFTWDKLDLLAQQEDADICLYGHLHRADAWKNG 120 
35 D+GYP+ LVT++ I QTHGHL+ INFTWDKLD AQ+ AD I CLYGHLHR AW+ G 

Sbjct: 61 DTGYPDRLVTQLGTVTIAQTHGHLYHINFTWDKLDYFAQEWADICLYGHLHRPAAWQVG 120 

Query: 121 KTIFINPGSVLQPRGPINEKLYAWTITDSKVLVEYYTRQHQPYPNLTKELSR 173 
+T+F+NPGSV QPRG INEKLYA V +TD+++ V+Y+TR H+ YP+L+KE R 
40 Sbjct: 121 QTLFMNPGSVTQPRGEINEKLYARVELTDTQIKVDYFTRDHKLYPSLSKEFKR 173 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1541 

45 A DNA sequence (GBSxl632) was identified in S.agalactiae <SEQ ID 4749> which encodes the amino 
acid sequence <SEQ ID 4750>. This protein is predicted to be HAM1 family protein. Analysis of this 
protein sequence reveals the following: 



50 



55 



Possible site: 23 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1218 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14796 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
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Identities = 96/189 (50%) , Positives = 130/189 (67%) , Gaps = 1/189 (0%) 



Query: 


128 


LIATHl^GKTKEFRELFGKLGLKOTNLNDYPDLPEVEETGMTFEENARLKAETISKLTGK 


187 






+IATHN GK KEF+E+ G V++L + E+EETG TFEENA +KAE ++K K 




Sbjct: 


8 


IIATHNPGKVKEFKEILEPRGYDVKSLAEIGFTEEIEETGHTFEENAIMKAEAVAKAVNK 


67 


Query: 


188 


MVISDDSGLKVDALGGLPGWSARFSGPimTDARNMAKLIjHELAMVFDKERRSAQFHTTL 


247 






MVI+DDSGL +D LGG PGV+SAR++G D N K+L EL + +KE+R+A+F L 




Sbjct: 


68 


MVIADDSGLSIDNLGGRPGVYSARYAGEQKDDQANIEKVLSELKGI-EKEQRTARFRCAL 


126 


Query: 


248 


WSAPNKESLWEAEWPGYIGTEPKGENGFGYDPLFIVGEGSRTAAELSAQEKNNLSHRG 


307 






VS P +E+ VE GYI EP+GE GFGYDP+FIV + +T AEL++ EKN +SHR 




Sbjct: 


127 


AVS I PGEETKTVEGHVEGYIAEE PRGE YGFGYDPI FI VKDKDKTMAELTSDEKNKI SHRA 


186 


Query: 


308 


QAVRKLMEV 316 








A++KL ++ 




Sb j ct : 


187 


DALKKLSKL 195 





A related DNA sequence was identified in S.pyogenes <SEQ ID 475 1> which encodes the amino acid 
sequence <SEQ ID 4752>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2590 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 214/325 (65%) , Positives = 253/325 (77%) , Gaps = 5/325 (1%) 



Query: 


1 


MTKTIFESKTEGNWFLGSFQAFNYFTCFG-NDESYEAIQDWHRLLSTLKVE- - -GLQLH 


56 






M++ I+E K E NWF+G N + +G + + I D+ + +TL E G + 




Sb j ct : 


14 


MSEKIYEYKDENNWFIGKMTGHNLISGWGVKHTTIKKIDDLLDGIAATLDWENPKGYDVS 


73 


Query: 


57 


WQMTSDFQLLAFLVDMINQEYSRHIKVTQHKGAILVSEDDQLFLVHLPKEGTSLEKFFD 


116 






W+ S L+ F++DMINQE R IKVT H G IL+ E+ +L V+LP+ G S FF 




Sbjct: 


74 


WRHQSPLSLITFIIDMINQETQREIKVTPHAGTILLMENAKLLAVYLPEGGVSTATFF- 


132 


Query: 


117 


LKNDNNFGDTILIATHIffiGKTKEFRELFGKLGLKVENLNDYPDLPEVEETGMTFEENARL 


176 






++ FGD I MAT NEGKTKEFR LFG+LG +VENLNDYP+LPEV ETG TFEENARL 




Sb j ct : 


133 


ATSEQGFGDIILIATRNEGKTKEFRNLFGQLGYRVENLNDYPELPEVAETGTTFEENARL 


192 


Query: 


177 


KAETISKLTGKWISDDSGLKOTALGGLPGWSARFSGPDATDARNNAKLLHELAMVFDK 


236 






KAETI S +LTGKMV+ +DDSGLKVDALGGL PG VWS ARFSGPDATDA+NNAKLLHELAMVFD+ 




Sb j ct : 


193 


KAETISRLTGK^WLADDSGLKVDALGGLPGWSARFSGPDATDAKNNAKLLHELAMVFDQ 


252 


Query: 


237 


ERRSAQFHTTLWSAPNKESLWEAEWPGYIGTEPKGENGFGYDPLFIVGEGSRTAAELS 


296 






++RSAQFHTTLW+APNK+SLWEA+WPGYI T+PKGENGFGYDP+FIVGE AAEL 




Sb j ct : 


253 


KKRSAQFHTTLWAAPNKDSLVVEADWPGYIATQPKGENGFGYDPVFIVGETGHHAAELE 


312 


Query: 


297 


AQEKNNLSHRGQAVRKLMEVFPKWQ 321 








A +KN LSHRGQAVRKLMEVFP WQ 




Sbjct: 


313 


ADQKNQLSHRGQAVRKLMEVFPAWQ 337 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1542 

A DNA sequence (GBSxl633) was identified in S.agalactiae <SEQ ID 4753> which encodes the amino 
acid sequence <SEQ ID 4754>. This protein is predicted to be glutamate racemase (murl). Analysis of this 
protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.86 Transmembrane 114 - 130 ( 114 - 130) 

Final Results 

bacterial membrane Certainty=0 . 1744 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10067> which encodes amino acid sequence <SEQ ID 
10068> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF72713 GB:AF263927 glutamate racemase [Carnobacterium sp. St2] 
Identities = 160/267 (59%) , Positives = 202/267 (74%) , Gaps = 3/267 (1%) 



Query: 


27 


MDSRPIGFLDSGVGGLTVVKEMFRQLPEEEVIFIGDQARAPYGPRPAQQIREFTWQMVNF 


86 






M + IGF+DSGVGGLTWKE RQLP E + ++GD AR PYGPRP Q+R+FTW+M +F 




Sbjct: 


1 


MKKQAIGFIDSGVGGLTVVKEAMRQLPNESIYYVGDTARCPYGPRPEDQVRKFTWEMTHF 


'60 


Query: 


87 


LLTKNVKMIVIACNTATAVAWQEIKEKIZIIPVLGVILPGASAAIKSTNLGKVGIIGTPMT 


146 






LD KN+KM+VIACNTATA A ++IK+KL IPV+GVILPG+ AAIK+T+ ++G+IGT T 




Sbjct: 


61 


LLDKNIKMLVIACOTATAAALKDIKKKLAIPVIGVILPGSRAAIKATHTNRIGVIGTEGT 


120 


Query: 


147 


VK3DAYRQKIQALSPNTAWS]^CPKFVPIVESNQMSSSIiAKKVVYETLSPLVGK-LDTL 


205 






VKS+ Y++ I + V SLACPKFVP+VESN+ SS++AKKW ETL PL + LDTL 




Sbjct: 


121 


VKSNQYKKMIHSKDTKALVTSLACPKFVPLVESNEYSSAIAKKVVAETLRPLKNEGLDTL 


180 


Query: 


206 


ILGCTHYPLLRPIIQNVMGAEVKLIDSGAETVRDISVLLNYFEINHNWQNKH-GGHHFYT 


264 






ILGCTHYPLLRPIIQN +G V LIDSGAETV ++S +L+YF + + QNK +FYT 




Sbjct: 


181 


ILGCTHYPLLRPIIQNTLGDSVTLIDSGAETVSEVSTILDYFNLAVDSQNKEKAERNFYT 


240 


Query: 


265 


TASPKGFKEIAEQWLS-QEINVERIVL 290 








T S + F IA +WL ++ VE I L 




Sb j ct : 


241 


TGSSQMFHAIASEWLQLDDLAVEHITL 267 





A related DNA sequence was identified in S.pyogenes <SEQ ID 475 5> which encodes the amino acid 
sequence <SEQ ID 4756>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
?>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.70 Transmembrane 88 - 104 ( 86 - 104) 



Certainty=0. 1680 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF72713 GB-.AF263927 glutamate racemase [Carnobacterium sp. St2] 
Identities = 149/267 (55%) , Positives = 202/267 (74%) , Gaps = 3/267 (1%) 

Query: 1 MDTRPIGFLDSGVGGLTWCELIRQLPHEKIVYIGDSARAPYGPRPKKQIKEYTWELVNF 60 

M + IGF+DSGVGGLTW E +RQLP+E I Y+GD+AR PYGPRP+ Q++++TWE+ +F 
Sbjct: 1 MKKQAIGFIDSGVGGLTVVKEAMRQLPNESIYYVGDTARCPYGPRPEDQVRKFTWEMTHF 60 

Query: 61 LLTQNVKMIVFACOTATAVAWEEVKAALDIPVIjGvVLPGASAAIKSTTKGQVGVIGTPMT 120 



Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 
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LL +N+KM+V ACNTATA A +++K L IPV+GV+LPG+ AAIK+T ++GVIGT T 



oDj cc : 


C 1 

b JL 




ion 


Query: 


121 


VASDIYRKKIQLLAPSIQWSLACPKFVPIVESNEMCSSIAKKIVYDSIAPLVGK-IDTL 


179 






V ^4- Y+TC T V SIACPKFVP+VESNE S+IAKK+V ++L PL + +DTL 




oDj Ct : 


JL-£ J- 


Vivorjy i iVruYlJ. flOlxJJ J. rJiiJ V A OJ_i/iL.rl\.r V-fJU VHDINCj I OOrt.±/iivI\.v vriili 1 JjrL±rJJx\i^I2jOJ_iL-/ 1 J-j 


180 


Query: 


180 


VLGCTHYPLLRPIIQNVMGPSVKLIDSGAECTRDISVLLNYFDIN-GNYHQKAVEHRFFT 


238 






+LGCTHYPLLRPIIQN +G SV LIDSGAE V ++S +L+YF++ + +++ E F+T 




Sbjct: 


181 


ILGCTHYPLLRPIIQNTLGDSVTLIDSGAETVSEVSTILDYFNLAVDSQNKEKAERNFYT 


240 


Query: 


239 


TANPE I FQE I AS I WLK- QKINVEHVTL 264 








T + ++F IAS WL+ + VEH+TL 




Sbjct: 


241 


TGSSQMFHAIASEWLQLDDLAVEHITL 267 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 195/264 (73%) , Positives = 231/264 (86%) 



Query: 


27 


MDSRPIGFLDSGVGGLTWKEMFRQLPEEEVIFIGDQARAPYGPRPAQQIREFTWQMVNF 


86 






MD+RPIGFLDSGVGGLTW E+ RQLP E++++IGD ARAPYGPRP +QI+E+TW++VNF 




Sb j ct : 


1 


MDTRPIGFLDSGVGGLTWCELIRQLPHEKIVYIGDSARAPYGPRPKKQIKEYTWELVNF 


60 


Query: 


87 


LLTKlWKMIVIAClSrrATAVAWQEIKEKLDIPVLGVILPGASAAIKSTNLGKVGIIGTPMT 


146 






LLT+NVKMIV ACNTATAVAW+E+K LD I PVLGV +LPGASAAI KST G+VG+IGTPMT 




Sbjct: 


61 


LLTQNVKMIVFACNTATAVAWEEVKAALDIPVLGWLPGASAAIKSTTKGQVGVIGTPMT 


120 


Query: 


147 


VKSDAYRQKIQALSPOTAWSLACPKFVPIVESNQMSSSLAK3CVVYETLSPLVGKLDTLI 


206 






V SD YR+KIQ L+P+ V SLACPKFVPIVESN+M SS+AKK+VY++L+PLVGK+DTL+ 




Sb j ct : 


121 


VASDIYRKKIQLLAPSIQVRSLACPKFVPIVESNEMCSSIAKKIVYDSLAPLVGKIDTLV 


180 


Query: 


207 


LGCTHYPLLRPIIQNVMGAEVKLIDSGAETVRDISVIiIiNYEEINHNWQNKHGGHHFYTTA 


266 






LGCTHYPLLRP 1 1 QNVMG VKLIDSGAE VRDISVLLNYF+IN N+ K H F+TTA 




Sbjct: 


181 


LGCTHYPLLRPIIQNvWGPSVKLIDSGAECTRDISVLlNYFDINGNYHQKAVEHRFFTTA 


240 


Query: 


267 


SPKGFKEIAEQWLSQEINVERIVL 290 








+P+ F+EIA WL Q+INVE + L 




Sbj ct: 


241 


NPEIFQEIASIWLKQKINVEHVTL 264 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1543 

A DNA sequence (GBSxl634) was identified in S.agalactiae <SEQ ID 4757> which encodes the amino 
acid sequence <SEQ ID 4758>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.36 Transmembrane 3 - 19 ( 1-27) 

Final Results 

bacterial membrane Certainty=0 . 5543 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13675 GB:Z99113 alternate gene name: yoxG [Bacillus subtilis] 
Identities = 26/72 (36%) , Positives = 42/72 (58%) 

Query: 1 MSITIWILLIIVALFGGLVGGIFIARKQIEKEIGEHPRLTPDAIREMMSQMGQKPSEAKV 60 
M++ + IL+ +VAL G+ G FIARK + + ++P + +R MM QMG KPS+ K+ 
, Sbjct: 1 MTLWVGILVGWALLIGVALGFFIARKYMMSYLKKNPPINEQMLRMMMMQMGMKPSQKKI 60 

•Query: 61 QQTYRNIVKHAK 72 



WO 02/34771 



-1713- 



PCT/GB01/04789 



Q + + K 
Sbjct: 61 NQMMKAMNNQTK 72 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4759> which encodes the amino acid 
5 sequence <SEQ ID 4760>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-10.72 Transmembrane 7 - 23 ( 1 - 27) 

10 Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

1 5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 62/79 (78%) , Positives = 69/79 (86%) 

Query. 1 MSITIWILLIIVALFGGLVGGIFIARKQIEKEIGEHPRLTPDAIREMMSQMGQKPSEAKV 60 
MS IWILL+IVAL G+ GGIFIARKQIEKEIGEHPRLTP+AIREMMSQMGQKPSEAK+ 
20 ,, Sbjct: 1 MSTAIWILLLIVALGVGVFGGIFIARKQIEKEIGEHPRLTPEAIREMMSQMGQKPSEAKI 60 

Query: 61 QQTYRNI VKHAKTAI KTKK 79 

QQTYRNI+K +K A+ K 
Sbjct: 61 QQTYRNI I KQSKAAVSKGK 79 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1544 

A DNA sequence (GBSxl635) was identified in S.agalactiae <SEQ ID 476 1> which encodes the amino 
30 acid sequence <SEQ ID 4762>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.86 Transmembrane 82 - 98 ( 79 - 103) 

35 Final Results 

bacterial membrane --- Certainty=0 .4142 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1545 

45 A DNA sequence (GBSxl636) was identified in S.agalactiae <SEQ ID 4763> which encodes the amino 
acid sequence <SEQ ID 4764>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 56 - 72 ( 50 - 105) 

50 INTEGRAL Likelihood = -7.27 Transmembrane 27 - 43 ( 17 - 48) 

INTEGRAL Likelihood = -6.26 Transmembrane 76 - 92 ( 73 - 105) 

INTEGRAL Likelihood = -4.83 Transmembrane 119 - 135 ( 118 - 141) 

INTEGRAL Likelihood = -1.65 Transmembrane 160 - 176 ( 160 - 176) 
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Final Results 

bacterial membrane Certainty=0 .5331 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8837> which encodes amino acid sequence <SEQ ID 8838> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 4765> which encodes the amino acid 
sequence <SEQ ID 4766>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
»> Seems to have a cleavable N-term signal seq. 



15 



INTEGRAL 


Likelihood 


=-10. 


,99 


Transmembrane 


45 - 


61 


( 37 - 


94) 


INTEGRAL 


Likelihood 


= -7. 


,06 


Transmembrane 


74 - 


90 


( 62 - 


94) 


INTEGRAL 


Likelihood 


= -3. 


.45 


Transmembrane 


110 - 


126 


( 108 - 


130) 


INTEGRAL 


Likelihood 


= -2. 


.18 


Transmembrane 


149 - 


165 


( 149 - 


165) 


INTEGRAL 


Likelihood 


= -1. 


.91 


Transmembrane 


21 - 


37 


( 20 - 


37) 



20 



30 



35 



40 



45 



50 



55 



Final Results 

bacterial membrane -• 

bacterial outside -- 

bacterial cytoplasm -■ 



- Certainty=0 . 5394 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



25 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 



Identities = 111/173 (64%) , Positives = 145/173 (83%) 

Query: 12 MSKKTTQMVS YTS I LVAFAIMI P I IMPAKII IGPASFTLASHVPLFLS I FI S VPVAI LVA 71 

M+KK TQ+ + + YTS I LVAFAI + 1 PI IMP K+IIGPASFTLASHVPLFL+IF+S+PVAILVA 
Sbjct: 1 MTKKPTQLIAYTSILVAFAILIPIIMPLKLIIGPASFTLASHVPLFLAIFMSIPVAILVA 60 

Query: 72 LGTGLGFLLAGFPIVIVLRALSHIGFALIAAFLIKSKPSLLMSKWQTLLFAVAINIIHGL 131 

LGT LGFLLAG P++IVLRALSH+ FA++AA+ + KP L+ S + FA IN+IHGL 
Sbjct: 61 LGTTLGFLLAGLPLIIVLRALSHLLFAILAAWWLSRKPQLMTSAVKCFSFAFFINVIHGL 120 

Query: 132 LEFITVYIITMTSNSSSTYLWSLFSLIGLGSLIiHGLVDFYIALFIWKWMTQKL 184 

EF+ VYI+T T+ +S +Y WS+ LIGLGSL+HG++DFY+AL +W+++ + L 
Sbjct: 121 AEFLWYILTATTATSMSYFWSMLGLIGLGSLIHGILDFYLALVLWRFLAKNL 173 

A related GBS gene <SEQ ID 10789> and protein <SEQ ID 10790> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 24 

Peak Value of UR: 3.16 

Net Charge of CR: 2 
McG: Discrim Score: 12.56 
GvH: Signal Score (-7.5): -0.16 

Possible site: 19 
»> Seems to have a cleavable N-term signal seq. 



Amino Acid Composition: calculated 


from 20 










ALOM program 


count: 5 value: 


-10. 


83 threshold: 


0.0 








INTEGRAL 


Likelihood =-10. 


.83 


Transmembrane 


45 


- 61 


( 39 - 


94) 


INTEGRAL 


Likelihood = -6. 


26 


Transmembrane 


65 


- 81 


( 62 - 


94) 


INTEGRAL 


Likelihood = -4 . 


.83 


Transmembrane 


108 


- 124 


( 107 - 


130) 


INTEGRAL 


Likelihood = -1. 


,65 


Transmembrane 


149 


- 165 


( 149 - 


165) 


INTEGRAL 


Likelihood = -0. 


.27 


Transmembrane 


24 


- 40 


( 24 - 


40) 


PERIPHERAL 


Likelihood = 0. 


.42 


86 
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modified ALOM score: 2.67 
icml HYPID: 7 CFP: 0.533 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .5331 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1546 

A DNA sequence (GBSxl637) was identified in S.agalactiae <SEQ ID 4767> which encodes the amino 
acid sequence <SEQ ID 4768>. This protein is predicted to be transcriptional regulator, biotin repressor 
family. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2237 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14749 GB:Z99118 yrxA [Bacillus subtilis] 
Identities = 72/165 (43%) , Positives = 112/165 (67%) , Gaps = 2/165 (1%) 

Query: 6 RRENILTTLKGTKEAISASTLAKIFSVSRQVIVGDIALLRAQQCDIISTPKGYIj-MSSAL 64 
RR+ +L LK +K ++ LAK +VSRQVIV DI+LL+A+ II+T +GY+ M +A 
■' Sbjct: 12 RRDQLLLWLKESKSPLTGGELAKKAWSRQVIVQDISLLKAKNVPIIATSCGYVYMDAAA 71 

Query: 65 STHQFTARLV-CQHGIEQTEEELEIILRYQGIIMNVEVEHPIYGMLTAPIjNIQSQKDIDN 123 

HQ R++ C HG E+TEEEL++I+ + +V++EHP+YG LTA + + ++K++ + 

Sbjct: 72 QQHQQAERIIACLHGPERTEEELQLIVDEGVTVKDVKIEHPVYGDLTAAIQVGTRKEVSH 131 

Query: 124 FTAKLKVSNAELLSSLTDGLHTHMISCQDQSVFDQICEALKKAGI 168 

F K+ +NA LS LTDG+H H ++ D+ DQ C+AL++AGI 
Sbjct: 132 FIKKINSTNAAYLSQLTDGVHLHTLTAPDEHRIDQACQALEEAGI 176 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4769> which encodes the amino acid 
sequence <SEQ ID 4770>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2971 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 109/170 (64%) , Positives = 136/170 (79%) 

Query: 1 MKAQERRENILTTLKGTKEAISASTLAKIFSVSRQVIVGDIALLRAQQCDIISTPKGYLM 60 

MKA++RR+ 1+ L ++A+SA+ L K+ VSRQVTVGDIALLRAQQ DIISTPKGY+M 
Sbjct: 1 MICAEDRRQKIIECI^SEQKAVSATRLGKLLGVSRQVTVGDIALLRAQQIDIISTPKGYIM 60 

Query: 61 SSALSTHQFTARLVCQHGIEQTEEELEIILRYQGIIMNVEVEHPIYGMLTAPIjNIQSQKD 120 
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S+AL +HQF AR+VCQH +E+T++ELEIIL +QGII VEVEHPIYGM+TAPLNI++ D 
Sbjct: 61 STALYSHQFCARIVCQHNVEETKKELEIILAHQGIITTVEVEHPIYGMITAPLNIKTHSD 120 

Query: 121 IDNFTAKLKVSNAELLSSLTDGLHTHMISCQDQSVFDQICEALKKAGILY 170 
5 + NF +KL S AELLSSLT+GLH+H+ISC Q F I L+ AGILY 

Sbjct: 121 VTNFMSKLSQSKAELLSSLTEGLHSHLISCPSQEAFIAIKHDLELAGILY 170 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1547 

A DNA sequence (GBSxl638) was identified in S.agalactiae <SEQ ID 477 1> which encodes the amino 
acid sequence <SEQ ID 4772>. Analysis of this protein sequence reveals the following: 



Possible site: 37 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-8. 


.44 


Transmembrane 


143 


- 159 


( 


138 


- 165) 


INTEGRAL 


Likelihood 




-8. 


,17 


Transmembrane 


164 


- 180 


( 


160 


- 184) 


INTEGRAL 


Likelihood 




-7. 


,17 


Transmembrane 


56 


- 72 


( 


53 


- 78) 


INTEGRAL 


Likelihood 




-5. 


,63 


Transmembrane 


24 


- 40 


( 


21 


- 44) 


INTEGRAL 


Likelihood 




-4. 


,94 


Transmembrane 


113 


- 129 


( 


108 


- 131) 


INTEGRAL 


Likelihood 




-2. 


,39 


Transmembrane 


86 


- 102 


( 


86 


- 103) 


INTEGRAL 


Likelihood 




-1. 


.06 


Transmembrane 


203 


- 219 


( 


203 


- 219) 



15 



20 

iw. 

Final Results 

bacterial membrane Certainty=0. 4376 (Affirmative) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10069> which encodes amino acid sequence <SEQ ID 
10070> was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC18360 GB:AF064763 putative membrane spanning protein 
[Lactococcus lactis subsp. cremoris] 
Identities = 97/188 (51%) , Positives = 133/188 (70%) 

35 Query: 38 IMLYMFPQNMIAIMQKMPGLYFGAIILELVLVFVASGAARRNTPAALPLFLIYSALNGFT 97 

IM+ F NM AI+Q 1+ LV+V G A +N+ ALP+F+ Y+A GF 

Sbjct: 1 IMITFFLDNMRAILQSGSLFLLVLWIIPLVMWSLQGLAMKNSKMALPIFIGYAAFMGFL 60 

Query: 98 LSFIIARYTQTTVLQAFITSAAVFFAMALIGAKTKKDLSGMRKALMAALIGILIASLVNL 157 
40 +SF + YT T + AFIT++A+FF +++ G TK++LSGM KAL A+ G+++A L+NL 

Sbjct: 61 ISFTLLMYTATDITLAFITASAMFFGLSvYGRFTKRNLSGMGKALGVAVWGLIVAMLLNL 120 

. Query: 158 FIGSGGMSYI IS IVCVI I FSGLIAYDNQMI KYVYNSQGGQ VADGWAVSMALSLYLDFINL 217 
F S G++ +IS+V V+IFSGLIA+DNQ I VYN+ GQV+DGWA+SMALSLYLDFIN+ 
45 Sbjct: 121 FFASTGLTILISLVGWIFSGLIAWDNQKITQVYNAHNGQVSDGWAISMALSLYLDFINM 180 

Query: 218 FLNILRLF 225 

FL +LRLF 
Sbjct: 181 FLFLLRLF 188 

50 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4773> which encodes the amino acid 
sequence <SEQ ID 4774>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
»> Seems to have no N-terminal signal sequence 

55 INTEGRAL Likelihood = -8.97 Transmembrane 143 - 159 ( 138 - 165) 

INTEGRAL Likelihood = -5.89 Transmembrane 164 - 180 ( 160 - 184) 

INTEGRAL Likelihood = -5.68 Transmembrane 56 - 72 ( 55 - 77) 

INTEGRAL Likelihood = -4.78 Transmembrane 113 - 129 ( 110 - 130) 
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INTEGRAL Likelihood = -2.81 Transmembrane 203 - 219 ( 203 - 222) 

INTEGRAL Likelihood = -2.76 Transmembrane 24 - 40 ( 23 - 41) 

INTEGRAL Likelihood = -2.76 Transmembrane 86 - 102 ( 86 - 104) 



5 Final Results 

bacterial membrane Certainty=0. 45 8 8 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:AAC18360 GB:AF064763 putative membrane spanning protein 
[Lactococcus lactis subsp. cremoris] 
Identities = 90/189 (47%) , Positives = 133/189 (69%) 

15 Query: 38 LMLYPFRENLISILVNQPMIYYGAAIIELILVFVASSAARKNTPAALPIFLIYSALNGFT 97 

+M+ F +N+ +IL + + II L++V A KN+ ALPIF+ Y+A GF 

Sbjct: 1 IMITFFLDNMRAILQSGSLFLLVLWIIPLVMWSLQGLAMKNSKMALPIFIGYAAFMGFL 60 

Query: 98 LSFIIVAYAQTTVFQAFLSSAAVFFAMSIIGVKTKRDMSGLRKAMFAALIGVWASLINL 157 
20 +SF ++ Y T + AF++++A+FF +S+ G TKR++SG+ KA+ A+ G++VA L+NL 

Sbjct: 61 ISFTLLMYTATDITLAFITASAMFFGLSVYGRFTKRNLSGMGKALGVAVWGLIVAMLLNL 120 

Query: 158 FIGSGMMSYVISVISVLIFSGLIASDNQMIKRVYQATNGQVGDGWAVAMALSLYLDFINL 217 
F S ++ +IS++ V+IFSGLIA DNQ I +VY A NGQV DGWA+ +MALSLYLDF IN+ 
25 Sbjct: 121 FFASTGLTILISLVGWIFSGLIAWDNQKITQVYNAHNGQVSDGWAISMALSLYLDFINM 180 

Query: 218 FISLLRIFG 226 

F+ LLR+FG 
Sbjct: 181 FLFLLRLFG 189 

30 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 167/229 (72%) , Positives = 202/229 (87%) 

Query: 1 MNDNVIYTQSDSGLNQFFAKIYGLVGIGVGLSAAVSAIMLYMFPQNMIAIMQKMPGLYFG 60 
35 MND+VIYTQSD GLNQFFAKIY LVG+GVGLSA VS +MLY F +N+I+I+ P +Y+G 

Sbjct: 1 MNDHVI YTQSDVGLNQFFAKI YSLVGMGVGLSAFVS YLMLYPFRENLI S ILVNQPMI YYG 60 

Query: 61 AIILELVLVFVASGAARRNTPAALPLFLIYSALNGFTLSFIIARYTQTTVLQAFITSAAV 120 
A I+EL+LVFVAS AAR+NTPAALP+FLI YSALNGFTLSFI I Y QTTV QAF++SAAV 
40 Sbjct: 61 AAIIELILVFVASSAARKNTPAALPIFLIYSALNGFTLSFIIVAYAQTTVFQAFLSSAAV 120 

Query:' 121 FFAMALIGAKTKKDLSGMRKALMAALIGILIASLVNLFIGSGGMSYIISIVCVIIFSGLI 180 

FFAM++IG KTK+D+SG+RKA+ AALIG+++ASL+NLFIGSG MSY+IS++ V+IFSGLI 
Sbjct: 121 FFAMSIIGVKTKRDMSGLRKAMFAALIGVWASLINLFIGSGMMSYVISVISVLIFSGLI 180 

45 , 

Query: 181 AYDNQMIKYVYNSQGGQVADGWAVSMALSLYLDFINLFLNILRLFARND 229 

A DNQMIK VY + GQV DGWAV+MALSLYLDFINLF+++LR+F RND 
Sbjct: 181 ASDNQMIKRVYQATNGQVGDGWAVAMALSLYLDFINLFISLLRIFGRND 229 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1548 

A DNA sequence (GBSxl639) was identified in S.agalactiae <SEQ ID 4775> which encodes the amino 
acid sequence <SEQ ID 4776>. Analysis of this protein sequence reveals the following: 

55 Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2495 (Affirmative) < suco 

60 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



-1718- 



A related GBS nucleic acid sequence <SEQ ID 10071> which encodes amino acid sequence <SEQ ID 
10072> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 4777> which encodes the amino acid 
sequence <SEQ ID 4778>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3277 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/163 (77%) , Positives = 141/163 (85%) 

Query: 7 YQDDKDFMDLVGHLIDHPRFQKLEAIVQHHHSTRLEHSINVSYTSYKIAKKFGWDASSTA 66 
Y +DK++M+ VGHLI HPRFQKL IVQH HSTRLEHS INVS Y+ S YK+AK+ FGWDA STA 
20 Sbjct: 3 YTEDKEYMEHVGHLIAHPRFQKLSH1VQHQHSTRLEHSINVSYSSYKLAKRFGWDAKSTA 62 

Query: 67 RGGLLHDFFYYDTOWKFNKSHAWVHPRIAVRNARKLTDUilAREEDIILKHMWGATIAPP 126 

RGGLLHDFFYYDWRVTKFNK HAWVHPRIAVRNA+KLT+LN +EEDIILKHMWGATIA P 
Sbjct: 63 RGGLLHDFFYYDWRVTKFNKGHAWVHPRIAvRNAKKLTELNKKEEDIILKHMWGATIAFP 122 

25 

Query: 127 RYKESYIVTMVDKYWAVREASRPIiKRIFKKPIRFSRKFLGSHN 169 

RYKESYI VTMVDKYWAV+EA PL++ + RK L SHN 

Sbjct: 123 RYKESYIVTMVDKYWAVKEAVTPLRQKWSNRRFLRRKTLQSHN 165 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1549 

A DNA sequence (GBSxl640) was identified in S.agalactiae <SEQ ID 4779> which encodes the amino 
acid sequence <SEQ ID 4780>. Analysis of this protein sequence reveals the following: 

35 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 213 - 229 ( 212 - 229) 

Final Results 

40 bacterial membrane Certainty=0 .2211 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9413> which encodes amino acid sequence <SEQ ID 9414> 
45 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14825 GB: Z99118 similar to rRNA methylase [Bacillus subtilis] 
Identities = 96/228 (42%) , Positives = 143/228 (62%) , Gaps = 5/228 (2%) 

50 Query: 3 QKKYRKSSYLIEGWHLFEEAEKYGAQFI^IFVT-ETAIDR-LRKPERAIVVTDDVLKELT 60 

+++ + +++LIEG HL EEA K I V ET I L + ++++D +T 

Sbjct: 22 KERTKTNTFLIEGEHLVEEALKSPGIVKEILVKDETRIPSDLETGIQCYMLSEDAFSAVT 81 

Query: 61 DSQTPQGIVAEIAFQETRWTDIKKGRFLVLEDVQDPGNLGTMVRTADAANFDAVFLSQKS 120 
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+++TPQ I A E + 4K L+++ VQDPGNLGTM+RTADAA DAV L + 

Sbjct: 82 ETETPQQIAAVCHMPEEKLATARK--VLLIDAVQDPGNLGTMIRTADAAGLDAWLGDGT 139 

Query: 121 ADLYNQKTLRSMQGSHFHLPVFRVEIEQFVNFCKAEGITMIATTLSEQSVNYKNLPKYDY 180 

AD +N KTLRS QGSHFH+PV R + " +V+ KAEG+ + T L + Y+ +P+ + 
Sbjct: 140 ADAFNGKTLRSAQGSHFHI PWRRNLPSYVDELKAEGVKVyGTAL - QNGAPYQE I PQSES 198 

Query: 181 FALIMGNEGQGISKTMTEEADVIAHIEMPGQAESIiNVAVARGVVIFSL 228 

FALI+GNEG G+ + E+ D+ ++ + GQAESLNVAVAA ++++ L 
Sbjct: 199 FAL I VGNEGAG VDAALLEKTDLNLYVPLYGQAESLNVA VAAAI LVYHL 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 478 1> which encodes the amino acid 
sequence <SEQ ID 4782>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.97 Transmembrane 229 - 245 ( 228 - 245) 



Final Results 

bacterial membrane Certainty=0 .2190 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 141/229 (61%) , Positives = 178/229 (77%) 

Query: 1 MLQKKYRKSSYLIEGWHLFEEAEKYGAQFIiNIFVTETAIDRLRKPERAIVVTDDVLKELT 60 

+LQKK+RK SYLIEGWHLFEEA+K G F +IPV E ++RL + ++V+ VLKELT 
Sbjct: 17 LLQKKHRKQSYLIEGWHLFEEAQKSGQVFRHI FVLEEMVERLftGEQELVI VS PQ VLKELT 76 

Query: 61 DSQTPQGIVAEIAFQETRWTDIKKGRFLVLEDVQDPGNLGT^WRTADARNFDAVFLSQKS 120 

DS +PQGIVAE+ + + KG++LVLEDVQDPGNLGT++RTADAA FD VFLS+KS 
Sbjct: 77 DSPSPQGIVAEVEIPKIiAFPSDYKGKYLVLEDVQDPGNLGTIIRTADAARFDGVFLSEKS 136 

Query: 121 ADLYNQKTLRSMQGSHFHLPVFRVEIEQFVNFCKAEGITMIATTLSEQSVNYKNLPKYDY 180 

AD+YNQKTLRSMQGSHFHLP++R ++ Q + ++ATTLS++SV+YK+L ++ 

Sbjct: 137 ADIYNQKTLRSMQGSHFHLPIWRTDVYQLCRELQEYETPILATTLSKKSVDYKSLTHHER 196 

Query: 181 FALIMGNEGQGISKTMTEEADVLAHIEMPGQAESLWAVAAGWIFSLI 229 

AL++GNEGQGIS M AD L HI MPGQAESLNVAVAAG++IFSLI 
Sbjct: 197 LALVLGNEGO^ISAEMAAIADQLVHITMPGQAESIiNVAVAAGILIFSLI 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8839> and protein <SEQ ID 8840> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Di scrim Score: -7.98 
GvH: Signal Score (-7.5): -3.86 

Possible site: 37 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -3.03 threshold: 0.0 

INTEGRAL Likelihood = -3.03 Transmembrane 213 - 229 ( 212 - 229) 
PERIPHERAL Likelihood = 5.14 149 
modified ALOM score: 1.11 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



-1720- 

The protein has homology with the following sequences in the databases: 

ORF02468(259 - 984 of 1287) 

EGAD] 107730 ]BS2859 (4 - 246 of 248) hypothetical protein {Bacillus subtilis} 
5 GP 1770029 eiribCftA99602.il |Z7520B hypothetical protein {Bacillus subtilis 

GP|2635330 emb | CAB14825 . 1 | | Z99118 similar to rRNA methylase {Bacillus subtilis} 
PIR|G69984|G69984 rRNA methylase homolog ysgA - Bacillus subtilis 
%Match =20.3 

%Identity =43.0 %Similarity =62.3 
10 Matches = 105 Mismatches = 87 Conservative Sub.s = 47 

186 216 246 276 306 330 360 390 

A*RNPTP*TRPETIK*TFFIT*PLF*YNRXMTTIITSKSM^ 

I I I =1 III II I :::|||| || ||| | 
MKQIESAKNQKVKDWKKLHTKKERTKTNTFLIEGEHLVEEALKSPGIVK 
10 20 30 40 

417 444 474 504 534 564 594 624 

NIFVT-ETAI-DRLRKPERAIWTDDVLKELTDSQTPQGIVAEIAFQETRWTDIKKGRFLVLEDVQDPGNLGTMVRTADA 
|:| 11 1 I : : : |: = :||| II | : : |::: I | I I I I I I I I = I I I I I 

EILVKDETRIPSDLETGIQCYMLSEDAFSAVTETETPQQIAAVCHMPEEKLA--TARKVLLIDAVQDPGNLGTMIRTADA 
60 70 80 90 100 110 120 



15 



20 



654 684 714 744 774 . 804 834 864 

^ ANFDAVFLSQKSADLYNQKTLRSMQGSHFHLPVFRWIEQFVNFCKAEGITMIATTLSEQSVOTKNLPKYDYFALIMG^ 
I =IH I =11 =1 Hill IIMIhll | : :|: ||||: : | | : | : : | : : |||| : ||| 

AGLDATOLGDGTADAFNGKTLRSACGSHFHIPVTORNLPSYVDELKAEGVKOTGTAL-QNGAPYQEIPQSESFAL1VGNE 
140 150 160 170 180 190 200 

30 894 924 954 984 1014 1044 1074 1104 

GQGISKTMTEEADVLAHIEMPGC3AESrjNVAVaAGVVIFSLI *VHMIi*YPQRGDYNEKVSRR*GLHGFGRSPY* PSTFPKT 
11= = I: |: :: : |||||||||||| :::: | 
GAGVDAALLEKTDLNLYVPLYGQAESLNVAVAAAILVYHLRG 
220 230 240 

35 SEQ ID 8840 (GBS430) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 77 (lane 5; MW 29kDa). 

GBS430-GST was purified as shown in Figure 220, lane 8. 
Example 1550 

A DNA sequence (GBSxl641) was identified in S.agalactiae <SEQ ID 4783> which encodes the amino 
40 acid sequence <SEQ ID 4784>. This protein is predicted to be acylphosphatase (acyP). Analysis of this 
protein sequence reveals the following: 

Possible site: 48 
■ >» Seems to have an uncleavable N-term signal seg 

45 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID 10073> which encodes amino acid sequence <SEQ ID 
10074> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



55 



>GP:AAD36630 GB:AE001801 acylphosphatase, putative [Thermotoga maritima] 
Identities = 35/88 (39%), Positives = 52/88 (58%), Gaps = 3/88 (3%) 

Query: 24 MKKTOLIVSGRVQGVGFRYATYSLALEIGDIYGRVVM 83 
MK + + V G VQGVGFRY T +A +G + G V N DDG+V I A+ D N + +F+ 
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Sbjct: 1 MKALKIRVEGIVQGVGFRYFTRRVAKSLG-VKI3YVMNMDDGSVFIHAEG-DENALRRFLN 58 

Query: 84 KIRKGPSKWSKVTYVDIKLDNFDDFNDF HI 
++ KGP + VT V ++ + + DF 
5 Sbjct: 59 EVAKGPPA-AVVTNVSVEETTPEGYEDF 85 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4785> which encodes the amino acid 
sequence <SEQ ID 4786>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 24 3 3 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 69/95 (72%) , Positives = 85/95 (88%) 

20 Query: 19 KRGQVMKKVHLIVSGRVQGVGFRYATYSIALEIGDIYGRVTOINDDGTVEILAQSTDSNKM 78 

K +M+KV LIVSGRVQGVGFRYAT++LAL+IGDIYGRVWNN+DGTVEILAQS DS+K+ 
Sbjct: 7 KEALLMQKVRLIVSGRVCGVGFRYATHTIiAIJDIGDIYGRvWNNNDGTVEILAQSKDSDKI 66 

Query: 79 TQFIQKIRKGPSKWSKVTYVDIKLDNFDDFNDFKM 113 
25 FIQ++RKGPSKW+KVTYVD+ + NF+DF DF++ 

Sbjct: 67 ATFIQEVRKGPSKWAKVTYVDVTMANFEDFQDFQI 101 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 1551 

A DNA sequence (GBSxl642) was identified in S.agalactiae <SEQ ID 4787> which encodes the amino 
acid sequence <SEQ ID 4788>. This protein is predicted to be membrane protein homolog (yidC). Analysis 
of this protein sequence reveals the following: 

Possible site: 16 
35 »> May be a lipoprotein 

INTEGRAL Likelihood =-12.52 Transmembrane 60 - 76 ( 54 - 83) 
INTEGRAL Likelihood = -3.66 Transmembrane 178 - 194 ( 177 - 196) 
INTEGRAL Likelihood = -2.76 Transmembrane 140 - 156 ( 137 - 157) 
INTEGRAL Likelihood = -2.60 Transmembrane 216 - 232 ( 213 - 232) 

40 

Final Results 

bacterial membrane Certainty=0 . 6010 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10075> which encodes amino acid sequence <SEQ ID 
10076> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF03934 GB:AF139908 membrane protein homolog [Listeria 
50 monocytogenes] 

Identities = 82/222 (36%) , Positives = 133/222 (58%) , Gaps = 4/222 (1%) 

Query: 44 PMANLITYFAQHQGLGFGVAI 1 1 VTVI VRWILPLGLYQSWKASYQAEKMAYFKPLFEPI 103 
P + I + A+ G +G+AIII T+++R +I+PL L + KMA KP + I 

55 Sbjct: 3 PFTSFIMFVAKWGGNYGIAIIITTLLIRALIMPLNLRTAKAQMGMQSKMAVAKPEIDEI 62 
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Query: 104 NERLRNAKTQEEKLAAQTELMTAQRENGLSMFGGIGCLPLLIQMPFFSAIFFAARYTPGV 163 

RL+ A ++EE+ Q E+M + ++ +GCLPLLIQMP A ++A R + + 
Sbjct: 63 QARLKRATSKEEQATIQKEMMAVYSKVNINPMQ-MGCLPLLIQMPILMAFYYAIRGSSEI 121 

Query: 164 SSATFLGLNLGQKSLTLTVIIAILYFVQSWLSMQGVPDEQRQQMKTMMYLMPIMMVFMSI 223 

+S TFL NLG + I) +1 ++Y Q ++SM G EQ++QMK + + PIM++F+S 
Sbjct: 122 ASHTFLWFNLGSPDMVLAI IAGLVYIAQYWSMIGYSPEQKKQMKIIGLMSPIMILFVSF 181 

Query: 224 SLPAS VALYWFIGGI FS I IQQLVT- - TYVLK- PKLRRKVEEE 262 

+ P+++ALYW +GG+F Q L+T Y+ K P+++ +EE 
Sbjct: 182 TAPSALALYWAVGGLFLAGQTLLTKKLYMNKHPEIKVMEQEE 223 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4789> which encodes the amino acid 
sequence <SEQ ID 4790>. Analysis of this protein sequence reveals the following: 

15 Possible site: 31 

»> May be a lipoprotein 

INTEGRAL Likelihood = -9.55 Transmembrane 62 - 78 ( 54 - 82) 

INTEGRAL Likelihood = -2.81 Transmembrane 178 - 194 ( 177 - 195) 

20 INTEGRAL Likelihood = -0.90 Transmembrane 216 - 232 ( 215 - 232) 

Final Results 

bacterial membrane --- Certainty=0 .4821 (Affirmative) < suoo 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF03934 GB:AF139908 membrane protein homolog [Listeria monocytogenes] 
^ Identities = 89/218 (40%) , Positives = 132/218 (59%) , Gaps = 2/218 (0%) 

Query: 43 KPMSYFIDYFANNAGLGYGLAIIIVTIIVRTLILPLGLYQSWKASYQSEKMAFLKPVFEP 102 

+P + FI + A G YG+AIII T+++R LI+PL L + KMA KP + 

Sbjct: 2 QPFTSFIMFVAKFVGGNYGIAIIITTLLIRALIMPLNLRTAKAQMGMQSKMAVAKPEIDE 61 

35 Query: 103 INKRIKQANSQEEKMAAQTELMAAQRAHGINPLGGIGCLPLLIQMPFFSAMYFAAQYTKG 162 

I R+K+A S+EE+ Q E+MA + INP+ +GCLPLLIQMP A Y+A + + 
Sbjct: 62 IQARLKRATSKEEQATIQKEMMAVYSKYNINPMQ-MGCLPLLIQMPILMAFYYAIRGSSE 120 

Query: 163 VSTSTFMGIDLGSRSLVLTAIIAALYFFQSWLSMMAVSEEQREQMKTMMYTMPIMMIFMS 222 
40 +++ TF+ +LGS +VL I +Y Q ++SM+ S EQ++QMK + PIM++F+S 

Sbjct: 121 IASHTFLWFNLGSPDMVLAIIAGLVYLAQYFVSMIGYSPEQKKQMKIIGLMSPIMILFVS 180 

Query: 223 FSLPAGVGLYWLVGGFFSIIQQLITTYLLKPRLHKQIK 260 
F+ P+ + LYW VGG F Q L+T L + H +IK 
45 Sbjct: 181 FTAPSALALYWAVGGLFLAGQTLLTKKLYMNK-HPEIK 217 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 203/309 (65%) , Positives = 254/309 (81%) , Gaps = 2/309 (0%) 

50 Query: 1 MKKTLKRILFSSLSLSMLLLLTGCVSVDKAGKPYGVIWNTLGVPMANLITYFAQHQGLGF 60 

+K TL RILFS L+LS+LL LTGCV D G P G+IW LG PM+ I YFA + GLG+ 
Sbjct: 1 LKLTLNRILFSGLALSILLTLTGCVGRDAHGNPKGMIWEFLGKPMSYFIDYFANNAGLGY 60 

Query: 61 GVAI I IVTVIVKWILPLGLYQSWKAS YQAEKMAYFKPLFEPINERLRNAKTQEEKLAAQ 120 
55 G+AIIIVT+IVR +ILPLGLYQSWKASYQ+EKMA+ KP+FEPIN+R++ A +QEEK+AAQ 

Sbjct: 61 GLAIIIVTIIVRTLILPLGLYQSWKASYQSEKMAFLKPVFEPINKRIKQANSQEEKMAAQ 120 

Query: 121 TELMTAQRENGLSMFGGIGCLPLLIQMPFFSAIFFAARYTPGVSSATFLGLNLGQKSLTL 180 
TELM AQR +G++ GGIGCLPLLIQMPFFSA++FAA+YT GVS++TF+G++LG +SL L 
60 Sbjct: 121 TELMAAQRAHGINPLGGIGCLPLLIQMPFFSAMYFAAQYTKGVSTSTFMGIDLGSRSLVL 180 

Query: 181 TVIIAILYFVQSm,SMQGVPDEQRQQMKTMMYLMPIMMVFMSISLPASVALYWFIGGIFS 240 

T IIA LYF QSWLSM V +EQR+QMKTMMY MPIMM+FMS SLPA V LYW +GG FS 
Sbjct: 181 TAIIAALYFFQSWLSI^VSEEQREQMKTMMYTMPIiyiMIFMSFSLPAGVGLYWLVGGFFS 240 



WO 02/34771 



PCT/GB01/04789 



-1723- 



Query: 241 IIQQLVTTYVLKPKLRRKVEEEYTKNPPKAYKANNARKDVTN^ 300 

I IQQL+TTY+LKP+L ++++EEY KNPPKAY++ ++RKDVT S ++N + K+N 
Sbjct: 241 IIQQLITTYLLKPRLHKQIKEEYAKNPPKAYQSTSSRKDVTPSQNMEQAN--LPKKIKSN 298 

Query: 301 RNAGKQKRR 309 

RNAGKQ++R 
Sbjct: 299 RNAGKQRKR 307 

A related GBS gene <SEQ ID 8841> and protein <SEQ ID 8842> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 23 Crend: 6 
McG: Discrim Score: 8.74 
GvH: Signal Score (-7.5): -1.47 

Possible site: 16 
>>> May be a lipoprotein 

ALOM program count: 4 value: -12.52 threshold: 0.0 

INTEGRAL Likelihood =-12.52 Transmembrane 60 - 76 ( 54 - 83) 
INTEGRAL Likelihood = -3.66 Transmembrane 178 - 194 ( 177 - 196) 
INTEGRAL Likelihood = -2.76 Transmembrane 140 - 156 ( 137 - 157) 
INTEGRAL Likelihood = -2.60 Transmembrane .216 - 232 ( 213 - 232) 
PERIPHERAL Likelihood = 0.74 235 
modified ALOM score: 3.00 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 6010 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

37.9/63.7% over 193aa 

Bacillus subtilis 

EGAD | 45886 | hypothetical 30.7 kd lipoprotein in glnq-ansr intergenic region precursor 
Insert characterized 

SP|P54544|YQJG_BACSU HYPOTHETICAL 30.7 KDA LIPOPROTEIN IN GLNQ-ANSR INTERGENIC REGION 
PRECURSOR. Insert characterized 

GP| 1303958 |dbj |BAA12613.l| |D84432 YqjG Insert characterized 

GP|2634823]emb]CAB14320.l| |Z99116 similar to lipoprotein SpoIIIJ-like Insert 
' characterized 

PIR|G69963 |G69963 lipoprotein SpoIIIJ-like homolog yqjG - Insert characterized 
ORF02470(478 - 1038 of 1530) 

EGAD| 45886 |BS2384 (63 - 256 of 275) hypothetical 30.7 kd lipoprotein in glnq-ansr intergenic 
region precursor {Bacillus subtilis}SP | P54544 | YQJG_BACSU HYPOTHETICAL 30.7 KDA LIPOPROTEIN 
IN GLNQ-ANSR INTERGENIC REGION PRECURSOR. GP 1 1303958 | dbj | BAA12613 . 1 1 | D84432 YqjG {Bacillus 
Subtilis}GP|2634823|emb|CAB14320.l| | Z99116 similar to lipoprotein SpoIIIJ-like {Bacillus 
subtilis }PIR | G69963 |G69963 lipoprotein SpoIIIJ-like homolog yqjG - Bacillus subtilis 
%Match =13.0 

%Identity =37.9 %Similarity =63.7 

Matches = 72 Mismatches = 65 Conservative Sub.s = 49 

252 282 312 342 372 402 432 462 

FCGSIV*FLKKK*NR*W*KLEELKTLKKTLKRILFSSLSLSMLLLLTGCTSVDKAGKPYGVIWNTLGVPMANLITYFAQ 

MLKTYQKLLAMGIFLIVLCSGNAAFAATNQVGGLSNVGFFHDYLIEPFSALLKGVAG 
10 20 30 40 50 

492 522 552 582 612 642 672 702 

HQGLGFGVAI I IVTVITOWILPLGLYQSWKASYQAEKMAYFKPLFEPINERLRNAKTQEEKLAAQTELMTAQRENGLSM 

:|::||:||:|||=|:||| : | | |||| || : | :|: | |:: I 1=1 :|: :: 

LFHGEYGLS I I L VTI I VRI WLPLFVNQFKKQRI FQEKMAVIKPQVDS IQVKLKKTKDPEKQKELQMEMMKLYQEHNINP 
70 80 90 100 110 120 130 
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732 762 792 822 852 894 918 
FGGIGCLPLLIQMPFFSAIFFAARYTPGVSSATFLGH^GQKSLTLTvTIAILYFVQSW LSMQ- -GVPDE- -QRQQ 

:||||:||| I :::| I II = = l =11 -III = = = = 1 = 1111:: II = II =1 

-IJWGCIiPMLIQSPIMIGLYYAIRSTPEIASHSFLWFSLGQSDILMSIiSAGIMYFVQAYIAQKLSAKYSAVPQNPAAQQS 
5 150 160 170 180 190 200 210 

948 978 1008 1038 1068 1098 1128 1158 

MKTMMYLMPIMMVFMSI SLPASVALYWFIGGI FS I IQQLVTTYVLKPKLRRKVEEEYTKNPPKAYKANNARKDVTNSTKA. 

I |::: 1 = 11 I = = = I I = = I I I I 1 = 1 =1 =1 = II 

1 0 AKLMVFI FPVMMTI FSLNVPAALPLYWFTSGLFLTVQNI VLQMTHHKSKKTAALTESVK 

230 240 250 260 270 

37.2/62.0% over 220aa 
Listeria monocytogenes 
15 GP| 6117974 | membrane protein homolog Insert characterized 

ORF02470(430 - 1086 of 1530) 

GP|6117974|gb|AAF03934.l|AF139908_4|AF139908(3 - 223 of 237) membrane protein homolog 
{Listeria monocytogenes} 
20 %Match =14.6 

%Identity =37.1 %Similarity =62.0 

Matches = 82 Mismatches = 81 Conservative Sub.s = 55 

285 315 345 375 405 435 465 495 

25 K*NR* VY* KLEELKTLKKTLKRILFSSLSLSMLLLLTGCVSVDKAGKPYGVIWNTLGVPMANLITYFAQHQGLGFGVAI I 

I =:| = h I =1=111 
IQPFTSFIMFVAKFVGGNYGIAII 
10 20 

30 525 555 585 615 645 675 705 735 

IVTVIWWILPHaYQSWKASYQAK^ 

I |s::| :|:|| I = III II = I 11= I ==ll= I 1=1 = == =1111111 

ITTLLIRALIMPIiNLRTAKAQMGMQSKMAVAKPEIDEIQARLKRATSKEEQATIQKEMMAVYSKYNINP-MQMGCLPLLI 
40 50 60 70 80 90 100 

765 795 825 855 885 915 945 975 

QMPFFSAIFFAARYTPGVSSATFLGLNLGQKSLTLTVIIAILYFVQSWLSMQGVPDEQRQQMKTMMYLMPIMMVFMSISL 
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QMPILMAFYYAIRGSSEIASHTFLWFNLGSPDMVLAIIAGLVYLAQYFVSMIGYSPEQKKQMKIIGLMSPIMILFVSFTA 
40 120 130 140 150 160 170 180 

1005 1035 1086 1116 1146 1176 1206 

PAS VALYWFIGGI FS I IQQLVTT- - YVLK- PKLRRKVEEEYTKNPPKAYKANNARKD VTNSTKATESNQAI ITSKKTNRN 

l===llll =11=1 I 1=1 h I l=== =11 
45 PSAIALYWAVGGLFLAGQTLLTraOiYMNKHPEIKVMEQEEKEFEQIVEEQKKEK 

200 210 220 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1552 

A DNA sequence (GBSxl644) was identified in S.agalactiae <SEQ ID 4791> which encodes the amino 
acid sequence <SEQ ID 4792>. This protein is predicted to be amino acid ABC transporter, permease 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 48 
55 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.98 Transmembrane 32 - 48 ( 23 - 53) 
INTEGRAL Likelihood = -9.18 Transmembrane 195 - 211 { 189 - 213) 
INTEGRAL Likelihood = -8.70 Transmembrane 72 - 88 ( 62 - 93) 

60 Final Results 

bacterial membrane Certainty=0 .4991 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12131 GB:Z99105 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 116/217 (53%) , Positives = 168/217 (76%) 

Query: 2 INWDAIFNLELAVKAFPSVIQGLPYTIGLSLVGFILGAIVGFFVALMKMSHFRLLRYLAN 61 

I W+ I FN +LA+++FP VI+G+ YT+ +S V G ++G F++L +MS LLR+ A 
Sbjct: 5 IQWEYIFNTKLAIESFPYVIKGIGYTLLISFVSMFAGTVIGLFISLARMSKLALLRWPAK 64 

Query: 62 IHISLMRGIPLMVLLFLIYFGliPFIGIQLDAvTASIVGFTMMSSAYISEIIRAALLAVDH 121 

++IS MRG+P++V+LF++YFG P+IGI+ AVTA+++GF++ S+AYI+EI R+A+ +V+ 
Sbjct: 65 LYISFMRGvPILVILFILYFGFPYIGIEFSAVTAALIGFSLNSAAYIAEINRSAISSVEK 124 

Query: 122 GQWEAARALGLKTPTIYRGIIIPQATRIALPSLSNvIiLDMVKSSSLTAMITVPDIFNNAK 181 

GQWEAA +LGL RGII+PQ+ RIALP L+NVLLD++K+SSL AMITVP++ +AK 

Sbjct: 125 GQWEAASSLGLSYWQTMRGIILPQSIRIALPPIiANVLLDLIKASSLAAMITVPELLQHAK 184 

Query: 182 IVGGTYSDYMTAYILVALIYWVICTLYAIIQDWVffiKR 218 

I+GG DYMT YIL ALIYW IC++ A+ Q+ EK+ 
Sbjct: 185 IIGGREFDYMTMYILTALIYWAICSIAAVFQNILEKK 221 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4793> which encodes the amino acid 

sequence <SEQ ID 4794>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
1 >>> Seems to. have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.79 Transmembrane 186 - 202 ( 184 - 205) 

INTEGRAL Likelihood = -5.84 Transmembrane 26 - 42 ( 21 - 43) 

INTEGRAL Likelihood = -4.78 Transmembrane 57 - 73 ( 56 - 84) 

INTEGRAL Likelihood = -1.59 Transmembrane 86 - 102 ( 86 - 103) 



Final Results 

bacterial membrane Certainty=0 . 3718 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12131 GB:Z99105 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 113/214 (52%) , Positives = 157/214 (72%) 

Query: 1 MINIPLMKDSLGFVLSGLPYTLGISLLSFFTGLFLGLGLALLGRSRQPLIHYLVRAYISI 60 

+ N L +S +V+ G+ YTL IS +S F G +GL ++L S+ L+ + + YIS 
Sbjct: 10 IFNTKLAIESFPOTIKGIGYTLLISFVSMFAGTVIGLFISLARMSKLALLRWPAKLYISF 69 

Query: 61 MRGVPMIWLFVLYFGLPYYGLELPALLCAYLGFSMVSAAYISEVFRSSIEAIDKGQWEA 120 

MRGVP++V+LF+LYFG PY G+E A+ A +GFS+ SAAYI+E+ RS+I +++KGQWEA 
Sbjct: 70 MRGVPILVILFILYFGFPYIGIEFSAVTAALIGFSLNSAAYIAEINRSAISSVEKGQWEA 129 

Query: 121 AKALGLPYALMVKKIILPQAFRIAVPPLGNVIIDMvKSSSLAAMITVPDIFQNAKIIGGR 180 

A +LGL Y ++ IILPQ+ RIA+PPL NV++D++K+SSLAAMITVP++ Q+AKIIGGR 
Sbjct: 130 ASSLGLSYWQT^GIILPQSIRIALPPLANVLLDLIKASSLAAMITVPELLQHAKIIGGR 189 

Query: 181 EWDYMSMYILVAFIYWLIAFLLERYQEFLENKLA 214 

E+DYM+MYIL A IYW I + +Q LE K A 
Sbjct: 190 EFDYMTMYILTALIYWAICSIAAVFQNILEKKYA 223 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 110/213 (51%) , Positives = 156/213 (72%) 

Query: 7 IFNLELAVKAFPSVIQGLPYTIGLSLVGFILGAIVGFFVALMKMSHFRLLRYLANIHISL 66 

+ N+ L + V+ GLPYT+G+SL+ F G +G +AL+ S L+ YL +IS+ 
Sbjct: 1 MINIPLMKDSLGFVLSGLPYTLGISLLSFFTGLFLGLGLALLGRSRQPLIHYLVRAYISI 60 
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Query: 67 MRGIPLMVLLFLIYFGLPFIGIQLDAVTASIVGFTMMSSAYISEIIRAALLAVDHGQWEA 126 

MRG+P++V+LF++YFGLP+ G++L A+ + +GF+M+S+AYISE+ R+++ A+D GQWFA 
Sbjct: '61 MRGVPMIVVLFVLYFGLPYYGLELPALLCAYLGFSMVSAAYISEVFRSSIFAIDKGQWFA 120 

5 

Query: 127 ARALGLKTPTIYRGIIIPQATRIALPSLSNVLLDMVKSSSLTAMITVPDIFNNAKIVGGT 186 

A+ALGL + + II+PQA RIA+P L NV++DMVKSSSL AMITVPDIF NAKI+GG 
Sbjct: 121 AKALGLPYALMVKKIILPQAFR1AVPPLGNVIIDMVKSSSIAAMITVPDIFQNAKIIGGR 180 

10 Query: 187 YSDYMTAYILVAIiIYWVI CTLYAI IQDWWEKRL 219 

DYM+ YILVA IYW+I h Q++ E +L 
Sbjct: 181 EWDYMSMYILVAFIYWLIAFLLERYQEFLENKL 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 1553 

A DNA sequence (GBSxl645) was identified in S.agalactiae <SEQ ID 4795> which encodes the amino 

acid sequence <SEQ ID 4796>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
20 »> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12132 GB-.Z99105 similar to amino acid ABC transporter 
(binding protein) [Bacillus subtilis] 
30 Identities = 127/276 (46%) , Positives = 183/276 (66%) , Gaps = 12/276 (4%) 

Query: 3 KTILLGLVGLSAMTLAACS--NGQSSKETTWDNIKKDGVLKVATPATLYPTSYYDDHK-- 58 

K ++ + LAACS N SK+T W+ IK G + VAT TLYPTSY+D 

Sbjct: 8 KAVIFSFTMAFFLILAACSGKNEADSKDTGWEQIKDKGKIWATSGTLYPTSYHDTDSGS 67 

35 

Query: 59 -KLTGYEIDMMKAIAKKLKIKVKFVEVGVAESFTSVDSGKVDVAVNNFDTTPERIiKKYNF 117 

KLTGYE+ + + + + AK+L +KV+F E+G+ T+V+SG+VD A N+ D T +R +K+ F 
Sbjct: 68 DKLTGYEVEVVRFAAKRLGLKVEFKEMGIDGMbTAVNSGQVDAMNDIDVTKDREEKFAF 127 

40 Query: 118 SQPYKYSVGGMIVROTGSSKITAKDLSDWKGKKAGGGAGTQYMKIAKQQGAEPVIYDNVT 177 

S PYKYS G IVR D SI K L D KGKKA GAT YM++A++ GA+ VIYDN T 
Sbjct: 128 STPYKYSYGTAIVRKDDLSGI - -KTLKDLKGKKAAGAATTVYMEVARKYGAKEVIYDNAT 185 

Query: 178 NDVYLRDVSTGRTDFI PNDYYTQVIAVKYVTKQYPD I KVKM - GDVKYNPTEQGI VMSKKD 236 
45 N+ YL+DV+ GRTD I NDYY Q +A+ +PD+ + + D+KY P +Q +VM K + 

Sbjct: 186 NEQYLKDVANGRTDVIIiNDYYLQTLAL AAFPDLNITIHPDIKYMPNKQALVMKKSN 241 

Query: 237 KSLKTKIDAAIKDMKKDGSLKKISEKYYAGQDLTKE 272 
+L+ K++ A+K+M KDGSL K+S++++ D++K+ 
50 Sbjct: 242 AALQKKMNEALKEMSKDGSLTKLSKQFFNKADVSKK 277 

There is also homology to SEQ ID 1 190. 

SEQ ID 4796 (GBS183) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 2; MW 33kDa). 



55 GBS183-His was purified as shown in Figure 199, lane 7. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1554 

A DNA sequence (GBSxl646) was identified in S.agalactiae <SEQ ID 4797> which encodes the amino 
5 acid sequence <SEQ ID 4798>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1514 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAF09821 GB:AE0018S5 6-aminohexanoate-cyclic-dimer hydrolase 

[Deinococcus radiodurans] 
Identities = 178/488 (36%) , Positives = 265/488 (53%) , Gaps = 17/488 (3%) 

Query. 5 DATAMVQAIKQHKISSQELVEQAIYKIEEQNVSVNAWSKQYNEARQAAKYANESNA- - - 61 
20 DA + Q ++ ++S++++ AI++ + NV++NAW Y++ A+ + + A 

Sbjct: 54 DALDLAQLFRRGELSAEDMCTAAIHRAQVVNVALNAVVYPLYDQGIAQARATDAARARGE 113 

Query: 62 PFAGVPILLKDLGQNQKGQLSTSGSQLFKHYHAKQTDYLVQSFEKLGFIILGRTNT 117 

PFAGVP L+KD G G T G++ ++ + D LV+ ++ G + LG+TNT 
25 Sbjct: 114 C^TGPFAGvPFLVKDFGSRLRGVPHTGGTRAYRDQI PEWDDELVRRWQAAGLLPLGKTNT 173 



30 



Query: 118 PEFGFKNISDGQLHGNVNLPFDHSRNAGGSSGGAAAAVSSGMVPIAGASDGGGSIRIPAS 177 

PEF +++ +LHG P+D R GGSSGG+A+AV++G+VP+AGA DGGGSIRIPAS 
Sbjct: 174 PEFAIMGVTEPEIJIGPTRNPWDI^RTPGGSSGGSASAVAAGIVPLaGaGDGGGSIRIPAS 233 

Query: 178 FNGLIGLKPSRGRIPVGPSSYRGWQGASSHFALTKSVRDTKRLLYYLQSYQVES PF 233 

GL GLKPSRGR+P G WQGA+ LT+SVRD+ LL Q + P 

Sbjct: 234 CCGLFGLKPSRGRVPCGDGVGEPWQGAAVEHVLTRSVRDSAALLDLEQGPDAGAALFLPS 293 

35 Query: 234 PLKKLSKESLFEFSVSKPLKIAVLMDSPLKTKVSSEAKAAIKEAADFLSQKGNHLELVEQ 293 
P + S+E E L+I PL V E AA++ AA L G+ +E V 

Sbjct: 294 PERPYSEEVGRE PGRLRIGFSTAHPLGRSVHPECVAAVQGAARLLESLGHEVEEVAL 350 

Query: 294 PLDGIHSMKTYCMmSVETAAMFDDIEKSLGRSMEFSDMELMTWAMYQSGQRVLAKDYSK 353 
40 P DG + + M+ ET A + +LGR SD+E +TW + Q G+ A D++ 

Sbjct: 351 PWDGPAIAQAFLMLYFGETGASIAALRDTIX3RPARASDVEAVTWLLGQLGRSYSAADFAA 410 

Query: 354 LLDSWDQFAATMARFHENYDL I LTAATNQPAP FHGQFD LDETLQKQLRHMGEFSVSE 410 

SW+ A M RFH+NYDL+LT P G+ + L++M + 

45 Sbjct: 411 ARASWNVHARAMGRFHQNYDLLLTPVLATPPLQIGELQPRGVQAALLRAAQQMDVSGLIiR 470 

Query: 411 QQDLIWKMFEDSMAWTPFTHQPNLTGQPSLAIPTHLTKEGLPLGVQLTAAKGREDLLLAV 470 

+ + + D + P+T NLTGQP++++P H T +GLP+GVQ A RED+LL + 
Sbjct: 471 RSGQVDALATDILEKMPYTQLANLTGQPAMSVPLHWTADGLPVGVQFVAPLAREDVLLRL 530 
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Query: 471 AELFEKEK 478 

A E+ + 
Sbjct: 531 AGQLEQAR 538 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 4047> which encodes the amino acid 
sequence <SEQ ID 4048>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have an uncleavable N-term signal seq 



60 



Final Results 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 277/484 (57%) , Positives = 348/484 (71%) , Gaps = 2/484 (0%) 

MVFKDATAMVQAIKQHKISSQELVEQAIYKIEEQNVSVNAWSKQYNEARQAAKVANESN 60 
M ++DATAM A++ + + ELV QAIYK ++ N ++NA+ S+++ A + AK + S 
MTYQDATAMAIAVQTGQTTPLELVTQAI YKAKKMPTLNAITSERFEAALEEAKQRDFSG 6 0 

APFAGVPILLKDLGQNQKGQLSTSGSQLFKHYHAKQTDYLVQSFEKLGFIILGRTNTPEF 120 

PFAGVP+ LKDLGQ KG STSGS+LFK Y A +TD V+ E LGFI ILGR+NTPEF 
LPFAGVPLFLKDLGQELKGHSSTSGSRLFKEYQATKTDIjFVKRLEALGFI ILGRSNTPEF 12 0 

GFKNISDGQLHGNVNLPFDHSRNAGGSSGGAAAAVSSGMVPIAGASDGGGSIRIPASFNG 180 
GFKNISD LHG VNLP D++RNAGGSSGGAAA VSSG+ +A ASDGGGSIRIPASFNG 



LIGLKPSRGR+PVGP SYR WQGAS HFALTKSVRDT+ LLYYLQ Q+ESPFPL L+K 



+S+++ S+ +PL IA + VS + A+++A +L ++G+ L EL E P++ 



++ Y +MNSVETAAMF DIE + GR M DME MTWA+YQSG+ + A YS++L WD 



++ATMA FHE YDL+LT TN PAP HG+ DLL FS EQ +L+ MF 



S+A P+T PNLTGQP++++PT+ TKEGL +G+QL AAKGREDLLL +AE FE 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sb j ct : 


241 


Query: 


300 


Sbj ct : 


300 


Query: 


360 


Sbj ct : 


360 


Query: 


420 


Sbjct: 


420 


Query: 


480 


Sbj ct : 


480 



K P 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1555 

A DNA sequence (GBSxl647) was identified in S.agalactiae <SEQ ID 4799> which encodes the amino 
acid sequence <SEQ ID 4800>. This protein is predicted to be transcription elongation factor (greA). 
Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5003 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14674 GB:Z99117 transcription elongation factor [Bacillus subtilis] 
Identities = 86/154 (55%) , Positives = 114/154 (73%) , Gaps = 1/154 (0%) 
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Query: 3 EKTYPMTQVEKDQLEKELEELKLVRRPEWERIKIARSYGDLSENSEYDAAKDEQAFVEG 62 

EK +PMT K +LE+ELE LK V+R EWERIKIARS+GDLSENSEYD+AK+EQAFVEG 
Sbjct: 4 EKVFPMTAEGKQKLEQELEYLKTVKRKEWERIKIARSFGDLSENSEYDSAKEEQAFVEG 63 

Query: 63 QIQILETKIRYAEIIDSDAVAKDEVAIGKTVLVQEVGTNDKDTYHIVGAAGADIFSGKIS 122 

++ LE IR A+II+ D + V +GKTV E+ D+++Y IVG+A AD F GKIS 
Sbjct: 64 RVTTLENMIRNAKIIEDDG-GSNWGLGKTVTFVELPDGDEESYTIVGSAEADPFEGKIS 122 

Query: 123 NESPIAHALIGKKTGDLATIESPAGSYQVEIISV 156 

N+SPIA +L+GKK + T+++P G V+I+ + 
Sbjct: 123 NDSPIAKSLLGKKVDEEVTVQTPGGEMLVKIVKI 156 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4801> which encodes the amino acid 
sequence <SEQ ID 4802>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/160 (90%) , Positives = 149/160 (92%) 

Query: 1 MAEKTYPMTQVEKDQLEKELEELKLVRRPEWERIKIARSYGDLSENSEYDAAKDEQAFV 60 

MAEKTYPMT EK+QLEKELEELKLVRRPE+VERIKIARSYGDIiSENSEYDARKDEQAFV 
Sbjct: 17 MAEKTYPMTLTEKEQLEKELEELKLVRRPEIVERIKIARSYGDLSENSEYDAAKDEQAFV 76 

Query: 61 EGQIQILETKIRYAEIIDSDAVAKDEVAIGKTVLVQEVGTNDKDTYHIVGAAGADIFSGK 120 

EGQI LETKIRYAEI IDSDAVAKDEVAIGKTV+VQEVGT DKDTYHIVGAAGADIFSGK 
Sbjct: 77 EGQISTLETKIRYAEIIDSDAVAKDEVAIGKTVIVQEVGTTDKDTYHIVGAAGADIFSGK 136 

Query: 121 I SNESPI AHAL IGKKTGDLAT IES PAGSYQ VEI ISVEKTN 160 

ISNESPIA ALIGKKTGD IESPA +Y VEIISVEKTN 
Sbjct: 137 I SNES P I AQALIGKKTGDKVRIES PAATYDVE 1 1 S VEKTN 176 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1556 

A DNA sequence (GBSxl648) was identified in S.agalactiae <SEQ ID 4803> which encodes the amino 
acid sequence <SEQ ID 4804>. This protein is predicted to be aminodeoxychorismate lyase-like protein. 
Analysis of this protein sequence reveals the following: 

possible site: 58 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-13.64 Transmembrane 238 - 254 ( 230 - 260) 



Final Results 



bacterial cytoplasm Certainty=0 .4434 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 6456 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAF77615 GB:AF151720 aminodeoxychorismate lyase-like protein 
[Streptococcus thermophilus] 
Identities = 135/210 (64%) , Positives = 171/210 (81%) 
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Query: 373 KTTSTPYKADDFLKLVQDETFIKKMVAKYPNLLGSLPDKSKAIYQLEGYLFPATYMYYKD 432 

K +ST K DFLKL++D+ FI KM AKYP LL +LP+ + A Y LEGYLFPATYN + D 
Sbjct: 5 KHSSTGLKEKDFLKiMKDDAFITKMKAKYPTLIANLPNSTDAKYVLEGYLFPATYNIHDD 64 

Query: 433 TTLEGLVEDMISTMISrrKMAPYYOTIKAK^SViroVLTLSSLVEKEGSTDEDRRKIASVFY 492 

TT+E L E+M+ TM+T ++PYY TI + N +VN++LTL+SLVEKEG+TD+DR+ IASVFY 
Sbjct: 65 TTVESIAEEMLFTMDTHLSPYYATILSSNHNWEILTLASLVEKEGATDDDRKNIASVFY 124 

Query: 493 ITOLSAGQALQSNIAILYAMGKLGDKTSLAEDAQINTSIKSPYNIYTNTGLMPGPVDSPSI 552 

NRL++ ALQSNIA+LY +GKLG +T+L EDA I+T+I SPYN Y + GLMPGPVDSPS+ 
Sbjct: 125 NRIiNSDMALQSNIAVLYVLGKLGQETTLKEDATIimilDSPYNDYVHKGLMPGPVDSPSL 184 

Query: 553 SAIEATIKPASTDYLYFVADVKTGNVYYAK 582 

SAIEA I P+ST Y+YFVADV TGNVY+A+ 
Sbjct: 185 SAIEAVINPSSTKYMYFVADVSTGNVYFAE 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4805> which encodes the amino 
sequence <SEQ ID 4806>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.91 Transmembrane 161 - 177 ( 155 - 183) 

Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF77615 GB:AF151720 aminodeoxychorismate lyase-like protein 
[Streptococcus thermophilus] 
Identities = 135/212 (63%), Positives =161/212 (75%) 

Query: 295 KTKKAKTPFNEKDFLDLWDEAFIQDMVKRYPKLLATIPTKEKAIYRLEGYLFPATYNYY 354 

K K + T EKDFL L+ D+AFI M +YP LLA +P AY LEGYLFPATYN + 
Sbjct: 3 KGKHSSTGLKEKDFLKLMKBDAFITKMKAKYPTLLANLPNSTDAKYVLEGYLFPATYNIH 62 

Query: 355 KETTMRELVEDMLAAMDATLVPYYDKIAASGKTVNEVLTLASLVEKEGSTDDDRRQIASV 414 

+TT+ L E+ML MD L PYY I +S VNE+LTLASLVEKEG+TDDDR+ IASV 
Sbjct: 63 DDTTVESLAEEMLFTMDTHLSPYYATILSSNHNVNEILTLASLVEKEGATDDDRKNIASV 122 

Query: 415 FYNRLNSGMALQSNIAILYAMGKLGEKTTLAEDATIDTTINSPYNIYTNTGLMPGPVASS 474 

FYNRLNS MALQSNIA+LY +GKLG++TTL EDATIDT I+SPYN Y + GLMPGPV S 
Sbjct: 123 FYtreLNSDMALQSNIAVLYVLGKLGQETTLKEDATIDTNIDSPYNDYVHKGLMPGPVDSP 182 

Query: 475 GVSAIEATLNPASTDYLYFVANVHTGEVYYAK 506 

+ SAIEA +NP+ST Y+YFVA+V TG VY+A+ 
Sbjct: 183 SLSAIEAVINPSSTKYMYFVADVSTGNVYFAE 214 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 310/603 (51%), Positives = 403/603 (66%), Gaps = 86/603 (14%) 

Query: 1 MTEFNDDQHSNHDQKSFKEQILAELEFANRIjRKIiREEELYQKEQEAKEAARRTAQLM 60 

+T+F D + Q+SFKEQILAELE+AN++RK +EEEL+ 
Sbjct: 3 LTDFKDKDQQDQ-QRSFKEQILAELEKANQIRKEKEEELF 41 

Query: 61 EAQRLKDEREARAKALETKQRLEEQEKARIEAKLLAEAAREEERRQAEQALASQEEQVIN 120 



Sb j ct : 



42 



++ LE +E AR A+L AE R++ 
■QKELEAKEAARRTAQLYAEYKRQD 



A Q+E + + 
-AFQKESIAH 



74 



Query: 



Sb j ct : 




103 



180 
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Query: 



Sb j ct : 



104 



181 



EWKLGEISELEPVAKEPIRVEDLSKEEEGIALSAKNKHNKRER---RQKADNVAKRIAR 237 
EN L ++ A E +++ + +E + L+ + H+ R + RQ+ + AK+I+ 
ENSSLKTTNKRWQANE LQETASKESQVPLTIEKGHSVRRKLSKRQQTERAAKKIST 160 



Query: 



238 



ILISIIILVLLLTAFVGYRFVDSAIKPVDSNSNKFVQVEIPIGSGNKLIGQILEKAGVIK 297 
+LIS 11+ LL G +V SA+ PVD NS+ FVQVEIP GSGNKLIGQIL+K G+IK 



Sbjct: 


161 


VLISSIIITLLAVTLAGAGYVYSALNPVDKNSDAFVQVEIPSGSGNKLIGQILQKKGLIK 


220 


Query: 


298 


SAWENYYSKFKHYSNFQSGYYNLKKSMTLDQIAAELEKGGTAEPTKPALGKILITEGYT 


357 






++TVF++Y+KFKN++NFQSGYYNL+KSM+L++IA+ L++GGTAEPTKP+LGKILI EGYT 




Sbjct: 


221 


NSTVFSFYTKFKNFTNFQSGYYNLQKSMSLEEIASALQEGGTAEPTKPSLGKILIPEGYT 


280 


Query: 


358 


IKQIAKAIESN-KIDTKTTSTPYKADDFLKLVQDETFIKKMVAKYPNLLGSLPDKSKAIY 


416 






IKQIAKA+E N K TK TP+ DFL LV DE FI+ MV +YP LL ++P K KAIY 




Sbjct: 


281 


IKQIAKAVEHNSKGKTKKAKTPFNEKDFLDLVTDEAFIQDMVKRYPKLLATIPTKEKAIY 


340 


Query: 


417 


QLEGYLFPATYNYYKDTTLEGLVEDM I STMNTKMAPYYNT I KAKNMS VND VLTLS SLVEK 


476 






+LEGYLFPATYNYYK+TT+ LVEDM++ M+ + PYY-t- I A +VN+VLTL+SLVEK 




Sb j ct : 


341 


RLEGYLFPATYNYYKETTMRELVEDMLAAMDATLVPYYDKIAASGKTVMEVLTLASLVEK 


400 


Query: 


477 


EGSTDEDRRKIASVFYNRLSAGQALQSNIAILYAMGKLGDKTSLAEDAQINTSIKSPYNI 


536 






EGSTD+DRR+ IAS VFYNRL+ +G ALQSNIAILYAMGKLG+KT+LAEDA I+T+I SPYNI 




Sb j ct : 


401 


EGSTDDDRRQIASVFYNRLNSGMALQSNIAILYAMGKLGEKTTLAEDATIDTTINSPYNI 


460 


Query: 


537 


YTNTGLMPGPVDSPSISAIEATIKPASTDYLYFVADVKTGNVYYAKDFETHKANVEKYIN 


596 






YTNTGLMPGPV S +SAIEAT+ PASTDYLYFVA+V TG VYYAK FE H ANVEKY+N 




Sbjct: 


461 


YTNTGLMPGPVASSGVSAIEATIiNPASTDYLYFVANVHTGEVYYAKTFEEHSANVEKYVN 


520 


Query: 


597 


SQI 599 








SQI 




Sbjct: 


521 


SQI 523 





A related GBS gene <SEQ ID 8843> and protein <SEQ ID 8844> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -17.88 
GvH: Signal Score (-7.5): -3.51 

Possible site: 58 
>» Seems to have no N-terminal signal sequence 
ALOM program count : 1 value : - 13 . 64 threshold : 0.0 

INTEGRAL Likelihood =-13.64 Transmembrane 238 - 254 ( 230 - 260) 
PERIPHERAL Likelihood = 5.78 285 
modified ALOM score: 3.23 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 6456 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF0093K1417 - 2046 of 2400) 

GP| 8574530 |gb|AAF77615.l|AF151720_l)AF151720 (5 - 214 of 214) aminodeoxychorismate lyase 
like protein {Streptococcus thermophilus} 
%Match =17.5 

%Identity = 64.3 %Similarity =81.4 

Matches = 135 Mismatches = 39 Conservative Sub.s = 36 

1236 1266 1296 1326 1356 1386 1416 1446 

NYYSKFKNYSNFQSGYYNLKKSMTLDQIAAELEKGGTAEPTKPALGKILITEGYTIKQIAKAIESNKIDTKTTSTPYKAD 



AKKGKHSSTGLKEK 
10 
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1476 1506 1536 1566 1596 1626 1656 1686 

DFLKLVQDETFIKKIWAKYPNLLGSLPDKSKAIYQLEGYLFPATYNYYKD^ 

llll|: = |: I! II llll II =11= = I I lllllllllll : llhl I 1 = 1= Ihl -111 II = I 
DFLKLMKDDAFITKMKAKyPTLLAMLPNSTDAKYVLEGYLFPATYNIHDDTTVESLAEEMLFTM^ 

5 30 40 50 60 70 80 90 

1716 1746 1776 1806 1836 1866 1896 1926 

SVWDVLTLSSLVEKEGSTDEDRRKIASVFYfniLSAGQAl^SNIAILYAMGKLGDKTSLAEDAQIOTSIKSPYNIYTNTGL 

:||::||| = llllll|:|hlh I I I I I I I I I : = I I I I I I I = I I =111! = I I III H I llll I * II 
1 0 NVlffilLTI^SLVEKEGATDDDRKNIASVFYNRLNSDMALQSNIAVLYVLGKIiGQETTLKEDATIDTNIDSPY^ 

110 120 130 140 150 160 170 



15 



1956 1986 2016 2046 2076 2106 2136 2166 

MPGPVDSPSISAIEATIKPASTDYLYFVADVKTGNVYYAKDFETHKAWVEKYINSQIN*AYKHGASHHVYIFDLKK*KEK 

IIIMIIM II I hi hllllll ll|l|:|: 
MPGPVDSPSLSAIEAV1NPSSTKYMYFVADVSTGNVYFAE 
190 200 210 

SEQ ID 8844 (GBS370) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 6; MW 70kDa). 

20 GBS370-His was purified as shown in Figure 209, lane 1 0. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1557 

A DNA sequence (GBSxl649) was identified in S.agalactiae <SEQ ID 4807> which encodes the amino 
25 acid sequence <SEQ ID 4808>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal' sequence 

Final Results 

30 bacterial cytoplasm : Certainty=0 . 0183 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10077> which encodes amino acid sequence <SEQ ID 
35 10078> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA98889 GB:Z74367 ORF YDR071C [Saccharomyces cerevisiae] 
Identities = 52/174 (29%) , Positives = 81/174 (45%) , Gaps = 18/174 (10%) . 

40 Query: 27 MSMIIRNGCLEDLQQVISIEQINFSEAEAASKKAMQERLTIMTDT FLVAEINGR 80 

+ M IE +EDL+Q++++E F E AS++ + RL + + EI G+ 

Sbjct: 10 LHMYIRPLIIEDLKQItNLESOGFPPNERASEEIISFRLINCPELCSGLFIREIEGKEVK 69 

Query: 81 LAGYIEGPVIKGRYLTDDLFHKVSEFPVRVGGFIGITSLSIHPDFKGQGIGTALIiAA 137 

45 L G+I G I Y+T + K+ V IGI S+ I P+++ + + T LL 

Sbjct: 70 KETLIGHIMGTKIPHEYITIESMGKLQ VESSNHIGIHSWIKPEYQKKNLATLLLTD 126 

Query: 138 MKDLWSQE - RDGI SLTCHDDLI S FYEMNGFKDEGES DSKHGGSLWYNM 185 

+ +QE +IL H+ LI FYE GFK E+ D W +M 

50 Sbjct: 127 YIQKLSNQEIGNKIVLIAHEPLIPFYERVGFKIIAENTNVAKDKNFAEQKWIDM 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4809> which encodes the amino acid 
sequence <SEQ ID 4810>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
55 >>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .2576 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/159 (54%) , Positives = 117/159 (72%) , Gaps = 1/159 (0%) 

10 - Query: 29 MIIRNGCLEDLQQVISIEQINFSEAEAASKKAMQERLTIMTDTFLVAEINGRLAGYIEGP 88 

M+IR DL+ + +IE NFS EA ++ ++E + ++ DTFLVA 1+ + GYIEGP 

Sbjct: 1 MLIRQVQGSDLEVIATIESDNFSPQEATTRAVLEEHIRLIPDTFLVALIDQEIVGYIEGP 60 

Query: 89 VIKGRYLTDDLFHKVSEFPVRVGGFIGITSLSIHPDFKGQGIGTALLAAMKDLWSQERD 148 
15 V+ L D LFH V++ P + GG+I ITSLSI F+ QG+GTALLAA+KDLW+Q+R 

Sbjct: 61 VOTTPILEDSLFHGVTKNP-KTGGYIAITSLSIAKHFQQQGVGTALLAALKDLWAQQRT 119 

Query: 149 GISLTCHDDLISFYEMNGFKDEGESDSKHGGSLWYNMIW 187 
G+ LTCHD LIS+YEMNGF ++G S+S+HGG+LWY MIW 
20 Sbjct: 120 GLILTCHDYLISYYEMNGFINQGISESQHGGTLWYQMIW 158 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1558 

25 A DNA sequence (GBSxl650) was identified in S.agalactiae <SEQ ID 481 1> which encodes the amino 
acid sequence <SEQ ID 4812>. This protein is predicted to be udp-n-acetylmuramate~alanine ligase 
(murC/ddlA). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have a cleavable N-term signal seq. 
30 INTEGRAL Likelihood = -2.60 Transmembrane 272 - 288 ( 270 - 288) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00294 GB:AF008220 putative UDP-N-acetylmuramate-alanine 
ligase [Bacillus subtilis] , , 

40 Identities = 238/432 (55%) , Positives = 315/432 (72%) , Gaps = 3/432 (0%) 



45 



Query: 5 YHFIGIKGSGMSALALMLHQMGHNVQGSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLE 64 

YHF+GIKG+GMS LA +LH G+ VQGSD++K+ FTQ LE+ +TILPFS NI + 
Sbjct: 4 YHFVGIKGTGMSPLAQILHDNGYTVQGSDIEKFIFTQTALEKRNITILPFSAENIKPGMT 63 

Query: 65 IIAGNAFRPDNNEELAYVIEKGYQFKRYHEFLGDFl^QFTSLGVAGAHGKTSTTGLLAHV 124 

+IAGNAF PD + E+ + +G RYH+FLGD+M++FTS+ V GAHGKTSTTGLLAHV 
Sbjct: 64 VIAGNAF-PDTHPEIEKAMSEGIPVIRYHKFLGDYMKKFTSVAVTGAHGKTSTTGLLAHV 122 

50 Query: 125 LKNITDTSFLIGDGTGRGSANANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGL 184 

++N TSFLIGDGTG+G+ N+ YFVFEA EY RHF+ Y P+Y+I+TNIDFDHPDYF+ + 
Sbjct: 123 IQNAKPTSFLIGDGTGQGNENSEYFVFEACEYRRHFLSYQPDYAIMTNIDFDHPDYFSSI 182 

Query: 185 EDVFmFNDYAKQVQKGLFIYGEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSD 244 
55 +DVF+AF + A QV KG+ G+D L +1 + P+ YYG + NDF A++I ++ G+ 

Sbjct: 183 DDVFDAFQEMALQVNKGIIACGDDEHLPKIHANVPVVYYGTGEENDFQARNIVKSTEGTT 242 

Query: 245 FKVFYNQEEIGQFHVPAYGKHNILNATAVIANLYIMGIDMALVAEHLKTFSGVKRRFTEK 304 
F VF F++PAYG HN+LN+ AVIA + ID +++ LK+F GVKRRF EK 

60 Sbjct: 243 FDVFVRNTFYDTFYIPAYGHHNVI^SIAVIALCHYEEIDSSIIKHALKSFGGVKRRFNEK 302 
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t 

Query: 305 IIDDTVIIDDFAHHPTEIIATLDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQA 364 

+ D V+IDD+AHHPTEI T++AARQKYP +EIVA+FQPHTFTRT LDEFA +LS A 
Sbjct: 303 QLGDQVLIDDYAHHPTEIKVTIFAARQICYPDREIVAVFQPHTFTRTQQFLDEFAESLSGA 362 

5 

Query: 365 DSVYI^QIYGSAREVDNGEVKVEDIAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDI 424 

D VYL I+GSARE + G++ + DL KI ++ L+ ++ S L HD AV +FMGAGDI 
Sbjct: 363 DCTYLCDIFGSARE-NAGKLTIGDLQGKI-HNAKLIEEDDTSVLKAHDKAVLIFMGAGDI 420 

10 Query: 425 QLYERS FEELLA 436 

Q Y R++E ++A 
Sbjct: 421 QKYMRAYENVMA 432 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4813> which encodes the amino acid 
15 sequence <SEQ ID 4814>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.57 Transmembrane 271 - 287 ( 269 - 288) 

20 Final Results 

bacterial membrane --- Certainty=0 . 2826 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAC00294 GB:AF008220 putative UDP-N-acetylmuramate-alanine 
ligase [Bacillus subtilis] 
Identities = 236/431 (54%), Positives = 310/431 (71%), Gaps = 2/431 (0%) 

30 Query: 5 YHFIGIKGSGMSALALMLHQMGHKVCjGSDVEKYYFTQRGLEQAGITILPFSEDNITPDME 64 

YHF+GI KG+GMS LA +LH G+ VQGSD+EK+ FTQ LE+ ITILPFS +NI P M 
Sbjct: 4 YHFVGIKGTGMSPl^QILHDNGYTVQGSDIEKFIFTQTALEKRNITILPFSAENIKPGMT 63 

Query: 65 LIVGNAFRENNKEVAYALRHQIPFKRYHDFLGDFMKSFISFAVAGAHGKTSTTGLLSHVL 124 
35 +1 GNAF + + E+ A+ IP RYH FLGD+MK F S AV GAHGKTSTTGLL+HV+ 

Sbjct: 64 VIAGNAFPDTHPEIEKAMSEGIPVIRYHKFLGDYMKKFTSVAVTGAHGKTSTTGLLAHVI 123 

Query: 125 KNITDTSYLIGDGTGRGSANAQYFVFESDEYERHFMPYHPEYSIITNIDFDHPDYFTGIA 184 
+N TS+LIGDGTG+G+ N++YFVFE+ EY RHF+ Y P+Y+ 1 +TNIDFDHPDYF+ I 
40 Sbjct: 124 QNAKPTSFLIGDGTGQGNENSEYFVFEACEYRRHFLSYQPDYAIMTNIDFDHPDYFSSID 183 

Query: 185 DVRNAFNDYAKQVKKALFVYGEDDELKKIEAPAPIYYYGFEEGNDFIAYDITRTTNGSDF 244 

DV +AF + A QV K + G+D+ L KI A P+ YYG E NDF A +1 ++T G+ F 
Sbjct: 184 DVFDAFQEMALQWKGIIACGDDEHLPKIHAWPVVYYGTGEENDFQARNIVKSTEGTTF 243 

45 

Query: 245 KVKHQGEVIGQFWPAYGKHNILNATAVIANLFVAGIDMALVADHLKTFSGVKRRFTEKI 304 

v + F++PAYG HN+LN+ AVIA ID +++ LK+F GVKRRF EK 

Sbjct: 244 DVFVRWTFYDTFYIPAYGHHIsrVLMSL^^ 303 

50 Query: 305 INDTIIIDDFAHHPTEIVATIDAARQKYPSKEIVAIFQPHTFTRTIALLEDFACALNEAD 364 

+ D ++IDD+AHHPTEI TI+AARQKYP +EIVA+FQPHTFTRT L++FA +L+ AD 
Sbjct: 304 LGDQVLIDDYAHHPTEIKVTIEAARQKYPDREIVAVFQPHTFTRTQQFLDEFAESLSGAD 363 

Query: 365 SWIAQIYGSAREVDKGEVKVEDLAAKIIKPSQVvTTONV'SPLLDHDNAVYVFMGAGDIQ 424 
55 , VYL I+GSARE + G++ + DL K I ++++ ++ S L HD AV +FMGAGDIQ 

Sbjct: 364 CVYLCDIFGSARE-NAGKLTIGDLQGK-IHNAKLIEEDDTSVLKAHDKAVLIFMGAGDIQ 421 

Query: 425 LYEHSFEELLA 435 
Y ++E ++A 
60 Sbjct: 422 KYMRAYENVMA 432 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 369/443 (83%) , Positives = 406/443 (91%) , Gaps = 1/443 (0%) 
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Query: 1 MSKTYHFIGIKGSGMSALALMLHQMGHNVQGSDVDKYYFTQRGLEQAGVTILPFSPNNIS 60 

MSKTYHFIGIKGSGMSAIALMLHQMGH VCGSDV+KYYFTQRGLEQAG+TIIiPFS +NI+ 
Sbjct: 1 MSKTYHFIGIKGSGMSALALMLHQMGHKVQGSDVEKYyFTQRGLEQAGITILPFSEDNIT 60 

5 Query: 61 EDLEIIAGNAFRPDNNEELAYVIEKGYQFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGL 120 

D+E+I GNAFR +NN+E+AY + FKRYH+FLGDFM+ F S VAGAHGKTSTTGL 

Sbjct: 61 PDMELIVGNAFR-ENNKEVAYALRHQIPFKRYHDFLGDFMKSFISFAVAGAHGKTSTTGL 119 

Query: 121 LAHVLKNITDTSFLIGDGTGRGSANANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDY 180 
10 L+HVLKNITDTS+LIGDGTGRGSANA YFVFE+DEYERHFMPYHPEYSI ITNIDFDHPDY 

Sbjct: 120 LSHVLKNITDTSYLIGDGTGRGSANAQYFVFESDEYERHFMPYHPEYSI ITNIDFDHPDY 179 

Query: 181 FTGLEDVFNAFITOYAKQVQKGLFIYGEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTV 240 
FTG+ DV NAFNDYAKQV+K LF+YGED +L +1 + APIYYYGFE+ NDFIA DITRT 
15 Sbjct: 180 FTGIADVRNAFNDYAKQVKKALFVYGEDDELKKIEAPAPIYYYGFEEGNDFIAYDITRTT 239 

Query: 241 NGSDFKVFYNQEEIGQFHVPAYGKHNILNATAVIANLYIMGIDMALVAEHLKTFSGVKRR 300 

NGSDFKV + E IGQFHVPAYGKHNILNATAVIANL++ GIDMALVA+HLKTFSGVKRR 
Sbjct: 240 NGSDFKVKHQGEVIGQFHVPAYGKHNILMATAVIANLFVAGIDMALVADHLKTFSGVKRR 299 

20 

Query: 301 FTEKIIDDTVIIDDFAHHPTEIIATLDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHA 360 

FTEKII+DT+IIDDFAHHPTEI+AT+DAARQKYPSKEIVAIFQPHTFTRTIALL++FA A 
Sbjct: 300 FTEKIINDTIIIDDFAHHPTEIVATIDAARQKYPSKEIVAIFQPHTFTRTIALLEDFACA 359 

25 Query: 361 LSQADSWIAQIYGSAREVDNGEVKA7EDLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMG 420 

L++ADSVYLAQIYGSAREVD GEVKVEDLAAKI+K S +VTVENVS PLL+HDNAVYVFMG 
Sbjct: 360 LNEADSWIAQIYGSAREVDKGEVKVEDIJyUCIIKPSQVVTVFJWSPLLDHDNAVYVFMG 419 

Query: 421 AGDI QLYERS FEELLANLTKNTQ 443 
30 AGDIQLYE SFEELLANLTKN Q 

Sbjct: 420 AGDIQLYEHSFEELLANLTKNNQ 442 

SEQ ID 4812 (GBS157) was expressed in E.coli as a BGs-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 11; MW 49kDa). It was also expressed in E.coli as a GST-fusion 
35 product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 8; MW 74kDa), Figure 33 
(lane 8; MW 74kDa) and Figure 37 (lane 3; MW 74kDa). 

The GBS157-GST fusion product was purified (Figure 112A; see also Figure 200, lane 3) and used to 
immunise mice (lane 1+2 product; 19.5ug/mouse). The resulting antiserum was used for Western blot 
(Figure 112B), FACS, and in the in vivo passive protection assay (Table III). These tests confirm that the 
40 protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

SEQ ID 4812 (GBS157) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 183 (lane 1 1-13; MW 74kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 1559 

A DNA sequence (GBSxl651) was identified in S.agalactiae <SEQ ID 4815> which encodes the amino 
acid sequence <SEQ ID 4816>. Analysis of this protein sequence reveals the following: 



50 



Possible site: 19 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1980 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 
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The protein has no significant homology with, any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4817> which encodes the amino acid 
sequence <SEQ ID 4818>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 . 2731 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/201 (39%) , Positives = 126/201 (51%) , Gaps = 9/201 (4%) 

Query: 7 RFPLIADDEPVMSPLVKMNLYDNEDLINNIRDPYQEKTYQSMVKSNYEHEEISHPKVIEN 66 

+FPL+AD + P +M LY+NEDLI NIR +YQ+K Y + ++ EE + 
Sbjct: 5 QFPLVADGIAISDPAKQMALYENEDLITNIRGYYQDKEYDDIARN-— EEFTAKATSRQ 60 

Query: 67 DPVPPQ--SFVKKATELSKSRQFAKRSVREKRQAYYAKQEFKAPSKEAFQQQLKATVPKK 124 

P + S +K + ++RQ+AK+ ++EKRQAY AK+ P + + +qq + p + 
Sb 3 Ct: 61 TPSSKRPCSNDEKHHYVKEARQKAKQDLKEKRCAYIAKEMAYVPKQVSKKQQPADSSPSQ 120 

Query: 125 QTQRKVTELSHLSDRLQQESYILAEIPIIFQEPDNTPNP-KTKKNNFDFLKRSQVYNKQD 183 
' . + + TE+S + +L Q++YIIAE+P ++EP N P TKKNN+DFLK SQ+YN ++ 

Sbact: 121 K-QATTEMSRFTKKLHQDNYILAELPKEYKEPKNLPQQGTTKKNNYDFLKSSQ1YNNKE 178 

Query: 184 NQFHKERAKAQELNLTRFKD I 204 

+ +E+ AQELNL+RF+D+ 
Sbjct: 179 MRQQREKTIAQELNLSRFEDL 199 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1560 

A DNA sequence (GBSxl652) was identified in S.agalactiae <SEQ ID 4819> which encodes the amino 
acid sequence <SEQ ID 4820>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4959 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or diagnostics. 
Example 1561 

A DNA sequence (GBSxl653) was identified in S.agalactiae <SEQ ID 4821> which encodes the amino 
acid sequence <SEQ ID 4822>. This protein is predicted to be SNF. Analysis of this protein sequence 
reveals the following: 
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Possible site: 28 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -0.32 Transmembrane 



743 - 759 ( 743 - 759) 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



Final Results 

bacterial membrane -• 
bacterial outside -• 
bacterial cytoplasm -■ 



- Certainty=0 . 1128 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

21/678 (3%) 



>GP:CAA67095 GB:X98455 SNF [Bacillus cereus] 
Identities = 259/678 (38%) , Positives = 406/678 (59%) , Gaps 



QNEILLQMVFDYGNDLTVHNRQELEQLTFASHFKHEEKVFKLLEKYGFAPHFSTSHPAYS 428 
+N +L + F YGN + ++ + F K E+++ ++ + FA + ++ 

KNRLLAGLEFHYGNWINPLEEDGQPSVFNRDEKKEKEILDIMSESAFAKT-EGGYFMHN 446 

AQELYDFYTYMLPQFKKMGTV- -SLSAKLESYRLIERPQIDIEAKGSL- -LDISFDFSDL 484 

+ Y+F +++P K+ + + + KL++ PI+K + L FD + 
EEAEYNFLYHIVPTLKGLVDIYATTAIKLRIHKGDTAPLIRVRRKERIDWLSFRFDIKGI 506 

LENDVDQALVALFDNNPYFVNKSGQLVI FD -EETKKVSATLQ - -GLRARRAKNGHIELDN 541 

E ++ L AL + Y+ +G L+ + +E +++ ++ G+R + + 

PEAEIKGvLAALEEKRKYYRLANGSLLSLESKEFNEINQFVKESGIRKEFLHGEEVNVPL 566 

IAAFQLSELFANQDNVSFSQHFYQLIEDLRHPEKFK- -IPGLSVSASLRDYQLTGVRWLS 599 
I + + + +S + L+E +++P+K K +P ++ A +R+YQ+ G W+ 

IRSWOTINGLHEGNVLSLDESVQDLvESIQNPKia^KFTVPP-TLHAvMREYQVYGFEWMK 625 

MLDHYGFAGILADDMGLGKTTjQTISFLSTKLT- -RDSR- - VLILSPSSLI YNWQDEFHKF 655 

L +Y F GILADDMGLGKTLQ+I+++ + L R+ + +L++SPSSL+YNW E KF 
TLAYYRFGGILADDMGLGKTLQSIAYIDSVLPEIREKKLPILWSPSSLVYNWFSELKKF 685 

APDVDVAVAYGSKIRRDEIIAE- -RHQVIITSYSSFRQDFETYSEGNYDYLILDEAQVMK 713 
AP + +A G++ R +1+ + V+IT3Y R+D +Y+ + L LDEAQ K 

APHIRAVIADGNQTERRKILKDVAEFDWITSYPLLRRDVRSYARP- FHTLFLDEAQAFK 744 

NAQTKIAHSLRSFEVKNCFALSGTPIENKLLEIWSIFQIILPGLLPGKKEFLKLNPKQVA 773 
N T+ A ++++ + + F L+GTP+EN L E+WSIF ++ P LLPG+KEF L + +A 
NPTTQTARAVKTIQAEYRFGLTGTPVENSLEELWSIFHWFPELLPGRKEFGDLRREDIA 804 

RYIKPFVMRRRKEEVLPELPDLIEMNYPNEMTDSQKVIYLAQLRQI-QESIQHSSDADLN 832 

+KPFV+RR KE+VL ELPD IE +E+ QK +Y A L ++ +E+++H L 
NAVKPFVLRRLKEDVLQELPDKIEHLQSSELLPDQKRLYAAYLAKLREETLKHLDKDTLR 864 

RRKIEILSGITRLRQICDTPRLFMD-YDGESGKLESLRQLLTQIKENGHRALIFSQFRGM 891 
+ KI IL+G+TRLRQIC+ P LF+D YGSKLEL +L++ GR LIFSQF M 
KNKIRILAGLTRLRQICNHPALFVDDYKGSSAKLEQLLDILEECRSTGKRILIFSQFTKM 924 

LDIAEREWAMGLTTYKITGSTPANERHEMTRAFNAGSKDAFLISLKAGGVGLNLTGADT 951 
L I RE+ + + + G+TP+ ER E+ FN G D FLISLKAGG GLNLTGADT 
LSI IGRELNROAI PYFYLDGNTPSQER VELCNRENEGEGDLFLI SLKAGGTGLNLTGADT 984 

vA/LlDLWiraPAVEMQAISRAHRLGQKENVEVYRLITRGTIEEKILEMQETKKHLVTTVLD 1011 
V+L DLWWNPAVE QA RA+R+GQK V+V +L+ GTIEEK+ E+QE+KKHL+ V++ 
VILYDLWWNPAVEQQAADRAYRMGQKNTVQVIKLVAHGTIEEKMHELQESKKHLIAEVIE 1044 

-GNETHASMSVDDIREIL 1028 
G E +S++ ++IR+IL 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4823> which encodes the amino acid 
sequence <SEQ ID 4824>. Analysis of this protein sequence reveals the following: 



Query: 


369 


Sb j ct : 


388 


Query: 


429 


Sbjct: 


447 


Query: 


485 


Sbjct: 


507 


Query: 


542 


Sbjct: 


567 


Query: 


600 


Sbjct: 


626 


Query: 


656 


Sbjct: 


686 


Query: 


714 


Sb j ct : 


745 


Query: 


774 


Sbjct: 


805 


Query: 


833 


Sbjct: 


865 


Query: 


892 


Sbjct: 


925 


Query: 


952 


Sbjct: 


985 


Query: 


1012 


Sb j ct : 


1045 



Possible site: 26 
65 >>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3909 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 674/1031 (65%) , Positives = 834/1031 (80%) , Gaps = 2/1031 (0%) 

Query: 1 MSRMIPGRIRNQGIELYEQGLVSLISQEGNLLKAKVGDCQIEYSLVTEETKCSCDFFARK 60 

M+R+I PGR+RN+GI +LYEQGLVS +L+ +V Q++Y E+ C CD F K 

Sbjct: 2 MARLIPGRVRNEGIKLYEQGLVSFQDDNKGILQIEVETYQVQYGADDEDITCQCDTFHMK 61 

Query: 61 GYCQHLAALEHFLKNDPEGKAILSKVQVQQESQQETKKKTSFGSVFLDSLIINEDDTIKY 120 

YC+H+AA+E+FLKND +GK L ++ Q + ++ TKK TSFGS+FLDSL +NEDD++KY 
Sbjct: 62 HYCKHIAAvEYFLKNDQKGKLFLKQLTNQTKIKETTKKMTSFGSLFLDSLAMNEDDSVKY 121 

Query: 121 QLSAQGEQNPYANDIWWTLKIRRLPDDRSYVIRDIKAFLNTVRKEAYYQIGKQYFETLSL 180 

+LSA G ++P+++D WW+LKI RLPDDRSYVIRDIK FL ++KE +YQIGK YFE LS 
Sbjct: 122 RLSALGSRSPFSSDYWWSLKINRLPDDRSYVIRDIKGFLQLIKKEGFYQIGKNYFEQLSW 181 

Query: 181 1QFDETSQELIEFLWRLIPSHSSKIDLEFILPNQGRHLSLTRGFFEEGVTLMNALENFSF 240 

+QFD +SQ LIEFLWRL S + K D E I PN RHL L GFFEEG+ + +L +F+F 
Sbjct: 182 LQFDPSSQALIEFLWRLA-SDTDKGDNENIFPNHARHLRLPSGFFEEGIHYLTSLYDFTF 240 

Query: 241 ESDFHQFNHLYFKELEGEDHLYQFKVIVHRQSIELEIKEKDLKPLFANSYLFYRDTFYHL 300 

E ++HL+ + LE E LY+FKtf VHR+SIEL+I EK+++ LF N YL Y+DTFYHL 

Sbjct: 241 EGPSQTYHHLFVRSLEAEAGLYEFKVEVHRKSIELQIAEKNVQYLFDNDYLLYQDTFYHL 300 

Query: 301 NLKQEKMTOAIRSLPIEGDIJUCHIHFDLDDQDKLaAHLLDFKEiGLVnAPRSFSIHDFKV 360 

LKQ KMV AIRSLPIE DIAKHIHFDLDD KLAA L DFK+IGLV+AP+SF+I DF+V 
Sbjct: 301 TLKQRKMVQAIRSLPIEADLAKHIHFDLDDHAKLAASLSDFRQIGLVEAPKSFAIRDFEV 360 

Query: 361 NFEFDINSQbffilLIQ^FDYGNDLTVHNRQELEQLTFASHFJCHEEKVFKLLEKYGFAPHF 420 

F+FD+ +++EI Q++FDYGN V ++ LE L FASH K EEK+ + L +GF+P F 
Sbjct: 361 TFQFDLLNRDEISCQLMFDYGN-YQVSDKASLEALPFASHLKKEEKINRSLLAFGFSPQF 419 

Query: 421 STSHPAYSAQELYDFYTYMLPQFKKMGTVSLSAKLESYRLIERPQIDIEAKGSLLDISFD 480 

+ SA+ELY F+ +P F+++G V+LS +++ ++ E P+I I LLDISFD 

Sbjct: 420 YSKKRLTSAKELYTFFEETVPCFERLGNVALSTAIQALQVKEMPKIAIRRNQGLLDISFD 479 

Query: 481 FSDLLENDVBQALVALFDNNPYFVNKSGQLVIFDEETKKVSATLQGLRARRAKNGHIELD 540 

FS ++END+DQA+ ALF NNPYFV+++GQLV+FD+ET+KVS +LQ LRAR+ KNGH++LD 
Sbjct: 480 FSTIIENDIDQAVTALFQNNPYFVSQTGQLWFDDETQKVSKSLQELRARQLKNGHLQLD 539 

Query: 541 NIAAFQLSELFANQDNVSFSQHFYQLIEDLRHPEKFKIPGLSVSASLRDYQLTGVRWLSM 600 

I A Q+S+LF +V FS+ +L L+HPE F I L V A +RDYQ GV+WLSM 
Sbjct: 540 GIRALQVSKLFEGMTS VHFSKELEELAYHLQHPETFS I KPLPVKAQMRDYQRNGVQWLSM 599 

Query: 601 LDHYGFAGIIADDMGLGKTLQTISFLSTKLTRDSRVLILSPSSLIYNWQDEFHKFAPDVD 660 

L+HYGF GILADDMGLGKTLQT++FL++ L DS+VLILSPSSLIYNW DE KF P +D 
Sbjct: 600 IMIYGFGG11ADDMGLGKTLQTIAFIASHI.KSDSKVLILSPSSLIYNWFDECQKFTPQLD 659 

Query: 661 VAVAYGSKIRRDEIIAERHQVIITSYSSFRQDFETYSEGNYDYLILDEAQVMKNAQTKIA 720 

V V+YG K RD+II E HQ+ ITSYSSFRQDFETY +YDYLILDEAQV+KNAQTKI+ 
Sbjct: 660 VWSYGLKQIRDQIIEEGHQITITSYSSFRQDFETYQAFHYDYLILDEAQVIKNAQTKIS 719 

Query: 721 HSLRSFEVKNCFALSGTPIENKLLEIWSIFQIILPGLLPGKKEFLKLNPKQVARYIKPFV 780 

H LR+F NCFALSGTPIENK+LEIWSIFQI+LPGLLP KKEFLKL +QV+RYIKPFV 
Sbjct: 720 HCLRAFNTANCFALSGTPIENKMLEIWSIFQIVLPGLIiPTKKEFLKLTAEQVSRYIKPFV 779 

Query: 781 MRRRKEEVLPELPDLIEMNYPNEMTDSQKVIYLAQLRQIQESIQHSSDADLNRRKIEILS 840 

MRR+KE+VLPELPDLIE+NY NEMTD QK IYLAQLRQ+Q+ I++SSD D++R+KIEILS 
Sbjct: 780 MRRKKEDVLPELPDLIEINYSNEMTDEQKAIYLAQLRQMQDQIRNSSDVDISRQKIEILS 839 

Query: 841 GITRLRQICDTPRLFMDYDGESGKLESLRQLLTQIKENGHRALIFSQFRGMLDIAEREMV 900 
GITRLRQICDTP LFMDY G+SGKL+SLR LLTQIKENGHRALIFSQFRGMLD+A++EM 
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Sbjct: 840 GITRLRQICDTPSLFMDYQGKSGKLDSLRILLTQIKENGHRALIFSQFRGMLDLAKQEMT 899 

Query: 901 AMGLTTYKITGSTPANERHEMTRAFNAGSKDAFLISLKAGGVGLNLTGADTVVLIDLWWN 960 
A+GLT+Y++TGSTPANER EMTRAFN GSKDAFLISLKAGGVG+NLTGADTV+LIDLWWN 
5 Sbjct: 900 ALGLTSYQMTGSTPANERQEMTRAFraGSKDAFLISLKAGGVGimTGADTVILIDLWWN 959 

Query: 961 PAVEMQAISRAHRLGQKENVEVYRLITRGTIEEKILEMQETKKHLVTTVLDGNETHASMS 1020 

PAVEMQAISRA+R+GQKENVEVYRLITRGTIEEKILE+QE+K++LVTTVLDGNE+ ASMS 
Sbjct: 960 PAVEMQAISRAYRIGQKENVEVYRLITRGTIEEKILELQESKRNLVTTVLDGNESRASMS 1019 

10 

Query: 1021 VDDIREILGVS 1031 

+++I+EILG++ 
Sbjct: 1020 IEEIKEILGIiN 1030 

15 SEQ ID 4822 (GBS369) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 5; MW 120kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 69 (lane 6; MW 142kDa). 

The GBS369-GST fusion product was purified (Figure 215, lane 7) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 303), which confirmed that the protein is immunoaccessible 
20 on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1562 

A DNA sequence (GBSxl654) was identified in S.agalactiae <SEQ ID 4825> which encodes the amino 
25 acid sequence <SEQ ID 4826>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 3391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
35 There is also homology to SEQ ID 1 034: 

Identities = 34/38 (89%) , Positives = 37/38 (96%) 

Query: 1 MEKEAKQIIDLKRNLFKIDVRAQKDEEKVFMRTACQFS 38 
+EKEAKQ+IDLKRNLFKIDVRAQKDEEKVFMRTAC+ S 
40 Sbjct: 1 LEKEAKQMIDLKRNLFKIDVRAQKDEEKVFMRTACRQS 38 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1563 

45 A DNA sequence (GBSxl656) was identified in S.agalactiae <SEQ ID 4827> which encodes the amino 
acid sequence <SEQ ID 4828>. This protein is predicted to be phosphoglycerate dehydrogenase (era2). 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

50 
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Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 3709 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAAS8823 GB:AB016077 phosphoglycerate dehydrogenase 
[Streptococcus mutans] 
Identities = 377/436 (86%) , Positives = 414/436 (94%) 



Query: 


1 


MVLPWAIVGRPNVGKSTLFNRIAGERISIVEDVEGVTRDRIYTTGEWLNRKFSLIDTGG 


60 






M LPTVAIVGRPNVGKS LFNRIAGERISIVEDVEGVTRDRIYT EWIjNR+FS+IDTGG 




Sbjct: 


1 


MALPWAIVGRPNVGKSALFNRIAGERISIVEDVEGVTRDRIYTKAEWLNRQFSIIDTGG 


60 


Query: 


61 


IDDVDAPFMEQIKHQADIAMTEADVIVFWSGKEGVTDADEYVSRILYKTNKPVILAVNK 


120 






IDDVDAPFMEQIKHQADIAMTEADVIVFWS KEG+TDADE YV+ + ILY+T+KPVI LAVNK 




Sbjct: 


61 


IDDVDAPFMEQI KHQAD IAMTEADVI VFWSAKEG ITDADEYVAKI LYRTHKPVTLAVNK 


120 


Query: 


121 


VDNPEMRNDIYDFYSLGLGDPYPLSSVHGIGTGDILDAIVENLPVEEENENPDIIRFSLI 


180 






VDNPEMR+ IYDFY+LGLGDPYP+SS HGIGTGD+LDAIV+NLP E + E+ DII+FSLI 




Sbjct: 


121 


VDNPEMRSAIYDFYALGLGDPYPVSSAHGIGTGDVLDAIVDNLPAEAQEESSDIIKFSLI 


180 


Query: 


181 


GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTNFVDSQGQEYTMIDTAGMRKSGKVY 


240 






GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDT F D +GQE+TMIDTAGMRKSGKVY 




Sbjct: 


181 


GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTTFTDEEGQEFTMIDTAGMRKSGKVY 


240 


Query: 


241 


E^r^EKYSVMRSMRAIDRSDVVL^WINAEEGIREYDKRIAGFAHETGKGIIIvVNKMDTIE 


300 






ENTEKYSVMR+MRAIDRSD+VLMV+NAEEGIREYDKRIAGFAHE GKGI++WNKWD 1+ 




Sb j ct : 


241 


ENTEKYSVMRAMRAIDRSDIVLMVIJSIAEEGIREYDKRIAGFAHEAGKGIVVVVNKWDAIK 


300 


Query: 


301 


KDNHTVSQWEADIRDNFQFLSYAPI IFVSAETKQRLHKLPDMIKRISESQNKRI PSAVLN 


360 






KDN TV+QWE DIRDNFQ++ YAPI+FVSA TKQRLHKLPD+IK++S+SQN RIPS+VLN 




Sbjct: 


301 


KDNRWAQWETDIRDNFQYIPYAPIVFVSAVTKQRLHKLPDVIKQVSQSQNTRIPSSVLN 


360 


Query: 


361 


DVIMDAIAINPTPTDKGKRLKIFYATQVAVKPPTFWFVNEEELMHFSYLRFLENQIREA 


420 






DV+MDA+AINPTPTDKGKRLKIFYATQV+VKPPTFV+FVNEEELMHFSYLRFLENQIR+A 




Sb j ct : 


361 


DWMDAVAINPTPTDKGKRLKIFYATQVSVKPPTFVIFVNEEELMHFSYLRFLENQIRQA 


420 


Query: 


421 


FVFEGTPINLIARKRK 436 








FVFEGTPI LIARKRK 




Sbjct: 


421 


FVFEGTPIRLIARKRK 436 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4829> which encodes the amino acid 
sequence <SEQ ID 4830>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3463 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 403/436 (92%) , Positives = 422/436 (96%) 

Query: 1 MVLPTVAIVGRPNVGKSTLFNRIAGERISIVEDvEGVTRDRIYTTGEWIiNRKFSLIDTGG 60 

IWLPTVAIVGRPNVGKSTLFNRIAGERISIVEDVEGVTRDRIY TGEWLNR+FSLIDTGG 
Sbjct: 1 MVLPWAIVGRPWGKSTLFNRIAGERISIVEDvEGVTRDRIYATGEWLNRQFSLIDTGG 60 

Query: 61 IDDVDAPF^QIKHQADIAMTFADVIVPWSGKEGOTDADEYVSRILYKTNKPVILAVNK 120 

IDDVDAPFMEQIKHQA I AM EADVIVFWSGKEGVTDADEYVS+ILY+TN PVILAVNK 
Sbjct: 61 IDDVDAPFMEQIKHQAQIAMEEADVIVFWSGKEGVTDADEWSKILYRTNTPVILAVNK 120 



Query: 121 VDNPEMRNDIYDFYSLGLGDPYPLSSVHGIGTGDIIiDAIVENLPVEEENENPDIIRFSLI 180 
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VDNPEMRNDIYDFYSLGLGDPYP+SSVHGIGTGD+LDAIVENLPVEE EN DIIRFSLI 
Sbjct: 121 VDNPEMRNDIYDFYSLGLGDPYPVSSVHGIGTGDVLDAIVENLPVEEAEENDDIIRFSLI 180 

Query: 181 GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTNFVDSQGQEYTMIDTAGMRKSGKVY 240 

GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDT+F D+ GQE+TMIDTAGMRKSGK+Y ■ 
Sbjct: 181 GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTHFTDADGQEFTMIDTAGMRKSGKIY 240 

Query: 241 ENTEKYSVMRSMRAIDRSDWLMVINAEEGIREYDKR1AGFAHETGKGI 1 IWNKWDTIE 300 . 

ENTEKYSVMR+MRAIDRSDWLMVINAEEGIREYDKRIAGFAHE GKG+IIVVNKWDTI+ 
Sbjct: 241 EOTEKYSWRRMRA.IDRSDVVLMVINAEEGIREYDKRIAGFAHEAGKGMIIVVNKWDTID 300 

Query: 301 KDNHTVSQWFADIRDNFQFLSYAPIIFVSAETKQRLHKLPDMIKRISESQNKRIPSAVLN 360 

KDNHTV++WEADIRD FQFL+YAPIIFVSA TKQRL+KLPD+ 1 KRI SESQNKRI PSA VliN 
Sbjct: 301 KDNHTVAKWFADIRDQFQFLTYAPIIFVSALTKQRLNKLPDLIKRISESQNKRIPSAVIjN 360 

Query: 361 DVIMDAIAINPTPTDKGKRLKIFYATQVAVKPPTFWFVNEEELMHFSYLRFLENQIREA 420 

DVIMDAIAINPTPTDKGKRLKIFYATQV+VKPPTFWFVNEEELMHFSYLRFLENQIR A 
Sbjct: 361 DVIMDA1AINPTPTDKGKRLKIFYATQVSVKPPTFWFVNEEELMHFSYLRFLENQIRAA 4.2.0 , 

Query: 421 FVFEGTPINLIARKRK 436 

F FEGTP I +L IARKRK 
Sbjct: 421 FTFEGTPIHL IARKRK 436 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1564 

A DNA sequence (GBSxl657) was identified in S.agalactiae <SEQ ID 4831> which encodes the amino 
acid sequence <SEQ ID 4832>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2734 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00359 GB:AF008220 Dnal [Bacillus subtilis] 
Identities = 105/313 (33%) , Positives = 191/313 (60%) , Gaps = 17/313 (5%) 

Query: 1 mKSVGQALENO^RVP--RNTNDELIQMILADAQVAEFIKTHQ---LSQREINISMSKFNQF 56 

M+ +G++L+ P + +++ + ++ D V F+K ++ + Q+ I S++K ++ 

Sbjct: 1 MEPIGRSIQGVTGRPDFQIOILEQMKEKVMKIJQDVQAFLKENEEVIDQKMIEKSLNKLYEY 60 

Query: 57 LIERQK FKNKDSQYI AKGYEPI L vMNEGYADVSYLE - - TREL IEAQKKQAI SDRI 109 

IE+ K ++++ + +GY P LV+N D+ Y E + ++ QKKQ + 

Sbjct: 61 - IEQSKNCSYCSEDENCTNLLEGYHPKL VWGRSIDIEYYECPVKRKLDQQKKQ- -QSLM 117 

Query: 110 NLvNLPKSYRNIRMTDFDINITOSRMKAMSQLLDFVETYPSYNH-KGLYLYGDMGVGKSYL 168 

+ + + DI++ SR+ + DF+++Y KGLYLYG GVGK+++ 

Sbjct: 118 KSMYIQQDLLGATFQQVDISDPSRLAMFQHVTDFLKSYNETGKGKGLYLYGKFGVGKTFM 177 

Query: 169 MAAMARELSERKGVSTTLLHFPSFAIDVKNAISSGTVKDEIDAVKSVPILILDDIGAEQA 228 

+AA+A EL+E++ S+ +++ P F ++KN++ T++++++ VK+ P+L+LDDIGAE 
Sbjct: 178 IAAIANEIAEKE-YSSMIvYVPEFVRELKNSLQDQTI^EEKLN^IVKTTPVLMLDDIGAESM 236 

Query: 229 TSWvRDEILQVILQHRMLEELPTFFTSNYSFNDLERKWA-NIKGSDETWQAKRVMERVRY 287 

TSWVRDE++ +LQHRM ++LPTFF+SN+S ++L+ + + +G E +A R+MER+ Y 
Sbjct: 237 TSWVRDEVIGTVLQHRMSQQLPTFFSSNFSPDELKHHFTYSQRGEKEEVKAARLMERILY 296 

Query: 288 LAIEFHLEGPNRR 300 
LA L+G NRR 
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Sbjct: 297 LAAPIRLDGENRR 309 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4833> which encodes the amino acid 
sequence <SEQ ID 4834>. Analysis of this protein sequence reveals the following: 

5 Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1944 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 228/300 (76%) , Positives = 264/300 (88%) 

15 



Query: 


1 


MKSVGQALENQGRVPRNTNDELIQMILADAQVAEFIKTHQLSQREINISMSKFNQFLIER 


60 






M+ +G+ + G+ R +D+LIQ ILftD +VA FI H LSQ +IN+S+SKFNQFL+ER 




Sb j ct : 


1 


MEKIGETMAKLGQNTRVNSDQLIQTIIADPEVASFISQHHLSQEQINLSLSKFNQFLVER 


60 


Query: 


61 


QKFKNKDSQYIAKGYEPILVMNEGYADVSYLETRELIEAQKKQAISDRINLvNLPKSYRN 


120 






QK++ KD YIAKGY+PIL MNEGYADVSYLET+EL+EAQK+ AIS+RI LV+LPKSYR+ 




Sb j ct : 


61 


QKYQLKDPSYIAKGYQPIIAMNEGYADVSYLETKELVEAQKQAAISERIQLVSLPKSYRH 


120 


Query: 


121 


IRMTDFDINNESRMKAMSQLLDFvETYPSYNHKGLYLYGDMGVGKSYLMAAMARELSERK 


180 






I ++D D+NN SRM+A S +LDFVE YPS KGLYLYGDMG+GKSYL+AAMA ELSE+K 




Sb j ct : 


121 


IHLSDIDVNNASRMEAFSAILDFVEQYPSAEQKGLYLYGDMGIGKSYLLAAMAHELSEKK 


180 


Query: 


181 


GVSTTLLHFPSFAIDVKNAISSGTVKDEIDAVKSVPILILDDIGAEQATSWVRDEILQVI 


240 






GVSTTLLHFPSFAIDVKNAIS+G+VK+EIDAVK+VP+LILDDIGAEQATSWVRDE+LQVI 




Sb j ct : 


181 


GVSTTLLHFPSFAIDvKNAISNGSVKEEIDAvlCNVPVLILDDIGAEQATSWVRDEVLQVI 


240 


Query: 


241 


LQHRMLEELPTFFTSNYSFNDLERKWANIKGSDETWQAKRVMERVRYLAIEFHLEGPNRR 


300 






LQ+RMLEELPTFFTSNYSF DLERKWA I KGSDETWQAKR VMER VRYLA EFHLEG NRR 




Sb j ct : 


241 


LQYRMLEELPTFFTSNYSFADLERKWATIKGSDETWQAKRVMERVRYLAREFHLEGANRR 


300 



35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1565 

A DNA sequence (GBSxl658) was identified in S.agalactiae <SEQ ID 4835> which encodes the amino 
40 acid sequence <SEQ ID 483 6>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2660 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 483 7> which encodes the amino acid 
sequence <SEQ ID 483 8>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 213 5 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/391 (55%), Positives = 309/391 (78%) 



Query: 


1 


MMSPIDEFTYIKQNKIWDSNSLIQLYFPIMGSDAMALYDYFVHFFDDGIRRHKFSEVIjN 


60 






MM PID FTY+K+NK+ DS +LIQLYFPI+GSDA+++Y YF+HFFDDG++RHKFS++LN 




Sb j ct : 


1 


MMKPIDTFTYLKRNKVTLDSVTLIQLYFPIIGSDAVSIYQYFIHFFDDGLQRHKFSDILN 


60 


Query: 


61. 


HLQYGMPRFQDALVMLTALDLLTWQATGTYLVKIiNQAMSNELFLSNPIYRRLLEKRIGE 


120 






HLQ+GM RF+DAL +LTA++L++VYQ + TYL+ L+Q +S +LF +P Y RLLE++IGE 




Sb j ct : 


61 


HLQFGMKRFEDALAILTAMELVSVYQLSDTYLITLHQPLSRDLFFQHPAYSRLLEQKIGE 


120 


Query: 


121 


VAVAELDMKIPKNARDISKKFTDVFSDLGQPKQEVNRSKNVFDLESFKRLMMRDGLRFMN 


180 






VAV+EL + +P AR+ISK+F+D+F G + + FDL SF++LM+RDGL+F + 




Sb j ct : 


121 


VAVSELQVTVPSQARNISKRFSDIFGVQGDLTOVPQKPQKNFDLSSFQQLMVRDGLQFED 


180 


Query: 


181 


EKDDVTjGIYSVSELYHLNWYDTYQIAKQTAINGMIAPQRM^^ 


240 






+ D++ +YS++E Y + W+DTYQ+AK TA+NG I P+R+ ++N+ ++F+ E 




Sbj ct : 


181 


NQKDIISLYSIAEQYDMTWFDTYQIAKATAVNGKIRPERLLAKKNQSMTKPSKENFSQAE 


240 


Query: 


241 


KVILRESKITOSALVFLEKIKRSRKAVTTSGEKTLLEDLAKMNFLDEVINVMVLYTLNKTK 


300 






++ILRE+K DSALVFLEKI K++R+A T E+ LL+ LAKMNFLD+VINVMVLYT NKTK 




Sbj ct : 


241 


QIILREAKQDSALVFLEKIKKARRATITKDERILLQTIAKMNFLDDVINVMVLYTFNKTK 


300 


Query: 


301 


SAOT1NKA.YIMKVANDFAFQNVMTAEDAVI1KIRDFSDQKVRTKTETKKKQSNVPEWSNPDY 


360 






SANL K+Y++K+ANDFA+Q V TAE+A++ +R F+D++ R +++ K QSNVP+WSNPDY 




Sbj ct : 


301 


SAMLQKSYVLKMANDFAYQKVSTAEEAIVVLRAFTDRQSRRQSKVKTSQSNVPKWSNPDY 


360 


Query: 


361 


KDEVSPEKEIELEQFKTDALKRLERLGKDGE 391 








++ S E++ +L+QFK ALKRLE LGK G+ 




Sbj ct : 


361 


QETTSQEEQAKLDQFKQAALKRLENLGKGGD 391 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1566 

A DNA sequence (GBSxl659) was identified in S.agalactiae <SEQ ID 4839> which encodes the amino 
acid sequence <SEQ ID 4840>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4485 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06865 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 80/150 (53%) , Positives = 115/150 (76%) 

Query: 1 MRCPKCGYNKSSWDSRQAEEGTTIRRRRECEKCGNRFTTFERLEELPLLVIKKDGTREQ 60 

MRCP C +N + V+DSR A EG +IRRRRECE C +RFTTFE +EE+PL+V+KKDGTR++ 
Sbjct: 1 MRCPACHHNGTRVIjDSRPAHEGRSIRRRRECTSCNHRFTTFEMIEEVPLIvVKKDGTRQE 60 

Query: 61 FSRDKIIJSIGIIQSAQKRPVSSEDIENCILRIERKIRSEYEDEVSSITIGNLVMDELAELD 120 

FS DKIL G+I++ +KRPV E +E + +ER++R + ++EV S IG LVM+ LA +D 
Sbjct: 61 FSSDKILRGLIRACEKRPVPLETLEGIVNFjVERELRGC^KNEVDSKEIGELVMERLANVD 120 

Query: 121 EITYVRFASVYKSFKDVDEIEELLQQITKR 150 

++ YVRFASVY+ FKD++ + L+++ +R 
Sbjct: 121 DVAYVRFASVYRQFKDINVFIQELKELMER 150 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 484 1> which encodes the amino acid 
sequence <SEQ ID 4842>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 43 6 5 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/155 (84%) , Positives = 143/155 (91%) 

15 Query: 1 MRCPKCGYNKSSVVTJSRQAEEGTTIRRRRECEKCGNRFTTFERLEELPLLVIKKDGTREQ 60 

+RCPKC Y+KSSWDSRQAE+G TIRRRRECE+C RFTTFER+EELPLLVIKKDGTREQ 
Sbjct: 1 VRCPKCNYHKSSWDSRQAEDGNTIRRRRECEQCHTRFTTFERVEELPLLVIKKDGTREQ 60 

Query: 61 FSRDKILNGIIQSAQKRPVSSEDIENCILRIERKIRSEYEDEVSSITIGNLVMDELAELD 120 
20 FSRDKILNG++QSAQKRPVSS DIEN I RIE+++R+ YE+EVSS IGNLVMDELAELD 

Sbjct: 61 FSRDKILNGWQSAQKRPVSSTDIENVISRIEQEVRTTYENEVSSTAIGNLVMDELAELD 120 

Query: 121 EITYVRFASVYKSFKDVDEIEELLQQITKRVRSKK 155 
EITYVRFASVYKSFKDVDEIEELLQQIT RVR KK 
25 Sbjct: 121 EITYVRFASVYKSFKDVDEIEELLQQITNRVRGKK 155 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1567 

30 A DNA sequence (GBSxl660) was identified in S.agalactiae <SEQ ID 4843> which encodes the amino 
acid sequence <SEQ ID 4844>. This protein is predicted , to be CsrS (mtrB). Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood =-11.30 Transmembrane 22 - 38 ( 18 - 43) 

INTEGRAL Likelihood = -9.66 Transmembrane 189 - 205 ( 187 - 212) 

Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2109> which encodes the amino acid 
sequence <SEQ ID 2110>. Analysis of this protein sequence reveals the following: 

45 Possible site: 35 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.32 Transmembrane 196 - 212 ( 189 - 214) 

Final Results 

50 bacterial membrane Certainty=0. 3527 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 248/501 (49%) , Positives = 363/501 (71%) , Gaps = 4/501 (0%) 
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Query: 1 MKNKKDQFIGVKQPLSKKLSQLVFILFFSLFTVFSVLVYTSATRYVLHREKINVGRSLEK 60 

M+N+K + K L K+LS + F+LFF +F+ F+++ Y+S ++L +EK +V +++ 
Sbjct: 1 MENQKQKQKKYKNSLPKRLSNIFFVLFFCIFSAFTLIAYSSTNYFLLKKEKQSVFQAVNI 60 

5 Query: 61 TRVRLSQANSSLTSDDILEILYNQVFADDIYPHKRQNGIVRTGESIDSILYVNQEMTLYD 120 

RVRLS+ +S+ T +++ E+LY ++ + ++R+ I + L NQ++ +Y+ 

Sbjct: 61 VRWLSEVDSNFTLFJsnjAEVLYKNDKTHLRIDDRKGSRVIRS 120 

Query: 121 VNRKPVFST-LRTGMPTIGKSMGKVIISKVADM-EGFVGTKAIYSQKTGQLLGYVQIFYN 178 
10 ++++ +F+T P + +G+V + D GF T+ +YS +TG+ +GYVQ+F++ 

Sbjct: 121 IDKQMIFTTDNEESSPGLHGPIGRVYHDHIEDQYRGFSMTQKVYSNRTGKFVGYVQVFHD 180 

Query: 179 LGRYYSMRQNI IVFLI^MEVLGTVLALVVINSATKRI^7RPVKNLHDLMHQISENPSNLEI 238 
LG YY +R ++ +L+++E+ GT LA ++I T+R ++P+ NLH++M ISENP+NL + 
15 Sbjct: 181 LGNYYVIRARLLFWLLVVELFGTSLAYLIILITTRRFLKPLHNLHEVMRNISENPNNLNL 240 

Query: 239 RSKVRSEDEIGELSRIFDGMLDQLEDYTRRQSQFISDVSHELRTPVAWKGHIGLLQRWG 298 

RS + S DEI ELS IFD MLD+LE +T+ QS+FISDVSHELRTPVA++KGHIGLLQRWG 
Sbjct: 241 RSDISSGDEIEELSVIFDNMLDKLETHTKLQSRFISDVSHELRTPVAIIKGHIGLLQRWG 300 

20 

Query: 299 KDDPEILEESLAAAYHEADRMSLMINDMLNMIRVQGSLELHQDEVTDLSSSISWIENFR 358 

KDD +ILEESL A HEADRM++MINDML+MIRVQGS E HQ+++T L SI V+ NFR 
Sbjct: 301 KDDSDILEESLTATAHEADRMAIMINDMLDMIRVQGSFEGHQNDMTVLEDSIETWGNFR 360 

25 Query: 359 ILREDFQFIFEMNISDIVWGKIYKIHFEQALMILIDNAIKYSPSYKEVSWLSVDNDFAT 418 

+LREDF F +++ + +IYK HFEQALMILIDNA+KYS K++++ LSV 

Sbjct: 361 VLREDFIFTWQSENPKTI-ARIYKNHFEQALMILIDNAVKYSRKEKKIAINLSVTGKQEA 419 

Query: 419 W-VKDKGEGISDEDIEFIFDRFYRTDKSRNRESTQAGLGIGLSVFKQIMDAYHLKVDIK 477 
30 +V V+DKGEGIS EDIE I F+RF YRTDKSRNR STQAGLGIGLS+ KQI+D YHL++ ++ 

Sbjct: 420 IVRVQDKGEGISKEDIEHIFERFYRTDKSRNRTSTQAGLGIGLSILKQIVDGYHLQMKVE 479 

Query. 478 SELNQGTEFIVRIPIKKFEET 498 
SELN+G+ FI+ IP+ + +E+ 
35 Sbjct: 480 SELNEGSVFILHIPLAQSKES 500 

A related GBS gene <SEQ ID 8845> and protein <SEQ ID 8846> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
40 SRCFLG: 0 

McG: Length of UR: 5 

Peak Value of UR: 0.74 
Net Charge of CR: 2 
McG: Discrim Score: -10.19 
45 GvH: Signal Score (-7.5): -3.66 

Possible site: 35 
»> Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -11.30 threshold: 0.0 
50 INTEGRAL Likelihood =-11.30 Transmembrane 22 - 38 ( 18 - 43) 

INTEGRAL Likelihood = -9.66 Transmembrane 189 - 205 ( 187 - 212) 
PERIPHERAL Likelihood = 2.86 405 



55 



modified ALOM score: 2.76 
icml HYPID: 7 CFP: 0.552 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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SEQ ID 8846 (GBS321) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 6; MW 84kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 80 (lane 2; MW 58.7kDa). 

GBS321-GST was purified as shown in Figure 220, lane 3. 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1568 

A DNA sequence (GBSxl661) was identified in S.agalactiae <SEQ ID 4845> which encodes the amino 
acid sequence <SEQ ID 4846>. This protein is predicted to be CsrR (trcR). Analysis of this protein 
1 0 sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 2649 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3259> which encodes the amino acid 
20 sequence <SEQ ID 3260>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 3226 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 193/229 (84%) , Positives = 211/229 (91%) , Gaps = 1/229 (0%) 

Query: 1 MGKKILIIEDEKNLARFVSLELLHEGYDVVVETNGREGLDTALEKDFDLILLDLMLPEMD 60 

M KKILI IEDEKNLARFVSLEL HEGY+V+VE NGREGL+TALEK+FDLILLDLMLPEMD 
Sbjct: 1 MTKKILIIEDEKNLARFVSLELQHEGYEVIVEVNGREGLETALEKEFDLILLDLMLPEMD 60 

35 

Query: 61 GFEITRRLQAEKTTYIMMMTARDSVMDIVAGLDRGADDYIVKPFAIEELLARVRAIFRRQ 120 

GFE+TRRLQ EKTTYIMMMTARDS+MD+VAGLDRGADDYIVKPFAIEELLAR+RAI FRRQ 
Sbjct: 61 GFEVTRRLQTEKTTYIMMMTARDSIMDWAGLDRGADDYIVKPFAIEELLARIRAIFRRQ 120 

40 Query: 121 EIETKTKEKGDSGSFRDLSl^HNRSAmGDEEISLTKREFDLLITOjMOTlNRVMTREEL 180 

+IE++ K+ G +RDL LN NRS RGD+EISLTKRE+DLLN+LMTNMNRVMTREEL 
Sbjct: 121 DIESE-KKVPSQGIYRDLVLNPQNRSVNRGDDEISLTKREYDLLNILMTNMNRVMTREEL 179 

Query: 181 LEHVWKYDvAAETNWDVYIRYLRGKIDIPGRESYIQTVRGMGYVIREK 229 
45 L +VWKYD A ETNWDVYIRYLRGKIDIPG+ESYIQTVRGMGYVIREK 

Sbjct: 180 LSNVWKYDEAVETNWDVYIRYLRGKIDIPGKESYIQTVRGMGYVIREK 228 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
•vaccines or diagnostics. 

50 Example 1569 

A DNA sequence (GBSxl662) was identified in S.agalactiae <SEQ ID 4847> which encodes the amino 
acid sequence <SEQ ID 4848>. Analysis of this protein sequence reveals the following: 
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Possible site: 60 

»> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 3864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAG32547 GB:U12643 YlbN-like hypothetical protein [Streptococcus gordonii] 

Identities = 91/174 (52%) , Positives = 133/174 (76%) , Gaps = 3/174 (1%) 

Query: 3 LTEIKKSPEGLYFDKKIDIKESLMEPJJSEIMDISDIQVSGHWYEDGLYLLDYNMAYDIT 62 
+ EI+K+P+GL F+KK+D+ E L ER++EI+D+ DI SG YEDGLY LDY ++Y IT 
15 SbjCt: 4 IQEIRKNPDGLAFEKKLDLAEELKERNAEILDVQDIVASGRAQYEDGLYFLDYELSYTIT 63 

Query: 63 LPSSRSMKPWLSEKQTINEVFIEAENVSTKKELVDQELVLILEEDDINLEESVIDNII.L 122 

L SSRSM+PV E +NE+F+E V++ +E++DQ+LVL +E +IN+ ESV DNILL 
Sbjct: 64 IASSRSMEPVERKESYLVNEIFMEDGQVAS-QEMIDQDLVLPIENGEINVAESVADNILL 122 

20 

Query: 123 NIPLRVL-AADEVGVEADLSGKNWSLMTEKQYEEKQAKEKEKSNPFAALEGMFD 175 

NIPL+VL AA+E G + +G++W +MTE Y++ QA++KE+++PFA L+G+FD 
Sbjct: 123 NIPLKVLTAAEEAGSDLP-TGRDWQVMTEDDYQKYQAEKKEENSPFAGLQGLFD 175 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 4849> which encodes the amino acid 
sequence <SEQ ID 4850>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 3032 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/175 (49%) , Positives = 135/175 (77%) 

Query: 1 MLLTEIKKSPEGLYFDKKIDIKESLMERHSEIMDISDIQVSGHWYEDGLYLLDYNMAYD 60 
+ ++EI+K P+GL FD+ D+K L+ER +I+DI ++ G+V Y+ GLYLLDY ++Y+ 
40 Sbjct: 3 LAISEIRKHPDGLSFDRLCDVKSMLLERDQQIIDIKAVKAVGNVRYDKGLYLLDYQLSYE 62 

Query: 61 ITLPSSRSMKPVVLSEKQTINEVFIEAENVSTKKELVDQELVLIIjEEDDINLEESVIDNI 120 

+ LPSSRSM PV LSE Q I E+FIEA +++ KKELV+ LVL+L++0 INLEES++DNI 
Sbjct: 63 VILPSSRSMVPVCLSEVQHIQELFIFATDI^KjKELVEDNLVLVLDKDAINLEESIVDNI 122 

45 

Query: 121 LLNIPLRVLAADEvGVEADLSGJCNWSLMTEKQYEEKQAKEKEKSNPFAALEGMFD 175 

LL IP++VL +E + +G+NW+++TE+ Y+ + ++++++NPFA+L+G+FD 
Sbjct: 123 LLAIPVQVLTEEEKKSKELPAGQNWAVLTEEDYQCLKEEKQKENNPFASLQGLFD 177 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1570 

A DNA sequence (GBSxl663) was identified in S.agalactiae <SEQ ID 4851> which encodes the amino 
acid sequence <SEQ ID 4852>. This protein is predicted to be heat shock protein (htpX). Analysis of this 
55 protein sequence reveals the following: 

Possible site: 25 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.30 Transmembrane 195 - 211 ( 190 - 221) 
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INTEGRAL Likelihood =-11.09 Transmembrane 43 - 59 ( 31 - 62) 
INTEGRAL Likelihood = -3.61 Transmembrane 153 - 169 ( 153 - 174) 

Final Results 

5 bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAB70525 GB:AF017421 putative heat shock protein HtpX 

[Streptococcus gordonii] 
Identities = 220/297 (74%) , Positives = 261/297 (87%) , Gaps = 1/297 (0%) 

Query: 1 MLYQQIASNKRKTVVLLI VFFCLLAAIGAAVGYLVLGSYQFGLVLALI IGVIYAVSMI FQ 60 
15 ML++QIA+NKR+T LL+ FF LLA IGAA GYL + S G+++A IIG+IYA++MIFQ 

Sbjct: 1 MLFEQIAANKRRTWFLLVAFFALLALIGAAAGYLWMNSPLGGVIIAFIIGLIYAITMIFQ 60 

Query: 61 STNVVMSMNNAREVTEDEAPNYFHIWDMAMIAQIPMPRVFIVEDDSLNAFATGSKPENA 12 0 
ST WMSMN AR+V+E EAP 4-HIV+DMAM+AQIPMPRV+I VEDDS NAFATGS PENA 
20 Sbjct: 61 STEWMSMNGARQVSEQEAPELYHIVQDMAMVAQIPMPRVYIVEDDSPNAFATGSNPENA 120 

Query: 121 AVAATTGLLAVMNREELEGVIGHEVSHIRNYDIRISTIAVALASAVTLISSIGSRMLFYG 180 

AVAATTGLL +MNREELEGVIGHEVSHIRNYDIRISTIAVALASA+T+ISS+ RM++YG 
Sbjct: 121 AVAATTGLLRLMNREELEGVIGHEVSHIRNYDIRISTIAVALASAITMISSVAGRMMWYG 180 

25 

Query: 181 GGRRRDDDREDGG-NILVLIFSILSLILAPLAASLVQLAISRQREYLADASSVELTRNPQ 239 

GGRRR+D +D G +L+L+FS++++ILAPLAA+LVQLAISRQRE+LADASSVELTRNPQ 
Sbjct: 181 GGRRRNDRDDDSGLGLLMLVFSLIAIIIAPLAATLVQLAISRQREFLADASSVELTRNPQ 240 

30 Query: 240 GMISALEKLDRSEPMGHPVDDASAALYINDPTKKEGLKSLFYTHPPIMDRIERLRHM 296 

GMI AL+KLD SEPM VDDASAALYI+DP KK GL+ LFYTHPPI++R+ERLR M 
Sbjct: 241 GMIRALQKLDNSEPMHRHvDnASAALYISDPKKKGGLQKLFYTHPPISERVERLRKM 297 

A related DNA sequence was identified in S.pyogenes <SEQ ID 485 3> which encodes the amino acid 
35 sequence <SEQ ID 4854>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.77 Transmembrane 197 - 213 ( 192 - 223) 
INTEGRAL Likelihood = -8.33 Transmembrane 43 - 59 ( 33 - 61) 
40 INTEGRAL Likelihood = -3.82 Transmembrane 153 - 169 ( 153 - 174) 

Final Results 

bacterial membrane Certainty=0. 4906 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB70525 GB:AF017421 putative heat shock protein HtpX [Streptococcus gordonii] 
Identities = 208/298 (69%) , Positives = 257/298 (85%) , Gaps = 1/298 (0%) 

Query: 1 MLYQQISQNKQRTVVLLVGFFALIJ^IGASAGYLLLDNYAMGLVLALVIGVIYATSMIFQ 60 

ML++QI+ NK+RT LLV FFALLALIGA+AGYL +++ G+++A +IG+IYA +MIFQ 
Sbjct: 1 MLFEQIAANKRRTWFLLVAFFALLALIGAAAGYLWMNSPLGGVIIAFIIGLIYAITMIFQ 60 

55 Query: 61 STSLVMSMNNAREVTEKEAPGFFHIVEDMAMVAQIPMPRVFIIEDPSLNAFATGSSPQNA 120 

ST +VMSMN AR+V+E+EAP +HIV+DMAMVAQIPMPRV+I+ED S NAFATGS+P+NA 
Sbjct: 61 STEvVMSMNGARQVSEQEAPELYHIVQDMAMVAQIPMPRVYIVEDDSPNAFATGSNPENA 120 

Query: 121 AVAATTGLLEVMNREELEGVIGHEISHIRNYDIRISTIAVALASAVTVISSIGGRMLWYG 180 
60 AVAATTGLL +MNREELEGVIGHE+SHIRNYDIRISTIAVALASA+T+ISS+ GRM+WYG 

Sbjct: 121 AVAATTGLLRLMNREELEGVIGHEVSHIRNYDIRISTIAVALASAITMISSVAGRMMWYG 180 



50 



Query: 181 GGSRRQRDDGDDDVLRIITLLLSLLSLLLAPLVASLIQLAISRQREYLADASSVELTRNP 240 
GG RR+ D DD L ++ L+ SL++++LAPL A+L+QLAISRQRE+LADASSVELTRNP 
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Sbjct: 181 GG-RRRNDRDDDSGLGLLMLVFSLIAIIIAPIAATLVQLA[SRQREFLADASSV^ 239 

Query: 241 QGMIKALEKLQLSQPMKHPVDDASAALYINEPRKKI^FSSLFSTHPPIEERIERLKNM 298 
QGMI+AL+KL S+PM VDDASAALYI++P+KK LF THPPI ER+ERL+ M 

5 Sbjct: 240 QGMIRALQKLDNSEPMHRHVDDASAAIiYISDPKKKGGLQKLFYTHPPISERVERLRKM 297 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 233/298 (78%), Positives = 262/298 (87%), Gaps = 2/298 (0%) 

10 Query: 1 MLYQQIASNKRKW^LIVFFCLI^IGAAVGYIjVMSYQFGLVIALIIGVIYAVSMIFQ 60 

MLYQQI+ NK++TWLL+ FF LLA IGA+ GYL+L +Y GLVLAL+IGVIYA SMIFQ 
Sbjct: 1 ^YQQISQNKQRTVWiVGFFALIALIGASAGYLLLDOTAMGLVIALVIGVIYATSMIFQ 60 

Query: 61 STNVVMSMNNAREVTEDFJVPNYFHIVEDMAMIAQIPMPRW^ 120 
15 ST++VMSMNNAREVTE EAP +FHIVEDMAM+AQIPMPRVFI+ED SLNAFATGS P+NA 

Sbjct: 61 STSLVMSMNNAREVTEKFJVPGFFHIWDMAMVAQIPMPRVFIIEDPSLNAFATGSSPQNA 120 

Query: 121 AVAATTGLLAVMNREELEGVIGHEVSHIRNYDIRISTIAVALASAVTLISSIGSRMLFYG 180 
AVAATTGLL VMNREELEGV1GHE+SH1RNYDIRISTIAVALASAVT+ISSIG RML+YG 
20 Sbjct: 121 AVAATTGLLEVMNREELEGVIGHEISHIRNYDIRISTIAVALASAVTVISSIGGRMLWYG 180 

. Query: 181 GG - - RRRDDDREDGGNI LVL I FS ILSLILAPLAASLVQLAI SRQREYLADASSVELTRNP 238 
GG R+RDD +D 1+ L4- S+LSL+LAPL ASL+QLAISRQREYLADASSVELTRNP 
Sbjct: 181 GGSRRQRDDGDDDVLRIITLLLSLLSLLL&PLVASLIQIAISRQREYLADASSVELTRNP 240 

25 

Query: 239 QGMISADEKLDRSEPMGHPVDDASAALYINDPTKKEGLKSLFYTHPPIADRIERLRHM 296 

QGMI ALEKL S+PM HPVDDASAALYIN+P KK SLF THPPI +RIERL++M 
Sbjct: 241 QGMIKALEKLQLSQPMKHPVDDASAALYINEPRKKRSFSSLFSTHPPIEERIERLKNM 298 

30 A related GBS gene <SEQ ID 8847> and protein <SEQ ID 8848> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 9.61 
GvH: Signal Score (-7.5): -0.97 
35 Possible site: 25 

»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 3 value: -11.30 threshold: 0.0 

INTEGRAL Likelihood =-11.30 Transmembrane 195 - 211 ( 190 - 221) 
INTEGRAL Likelihood =-11.09 Transmembrane 43 - 59 ( 31 - 62) 
40 INTEGRAL Likelihood = -3.61 Transmembrane 153 - 169 ( 153 - 174) 

PERIPHERAL Likelihood =5.89 87 
modified ALOM score: 2.76 
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*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

73.8/88.3% over 296aa 

imported 

SP] 030795 1 PUTATIVE HEAT SHOCK PROTEIN HTPX. Insert characterized 
55 Gpj 2407215 |gb|AAB70525.1 1 |AF017421 putative heat shock protein HtpX {Streptococcus 

gordonii} Insert characterized 

PIR|T48855|T48855 probable heat shock protein HtpX - Streptococcus gordonii Insert 
characterized 

60 ORF02338(301 - 1188 of 1488) 

SP|030795|HTPX_STRGC(1 - 297 of 297) PUTATIVE HEAT SHOCK PROTEIN 

HTPX.GP|2407215|gb|AAB70525.l| |AF017421 putative heat shock protein HtpX {streptococcus 
gordonii}PIR|T48855|T48855 probable heat shock protein HtpX [imported] - Streptococcus 
gordonii 
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%Match =44.0 

%Identity = 73.7 %Similarity = 88.2 

Matches = 219 Mismatches = 34 Conservative Sub.s = 43 

141 171 201 231 261 291 321 351 

NFLFTSVI*HMNIQL*CEIRNFPK*YCWKTIWQTKPILRNS*RRKRSAKSFL*LLIEKGERLLLYQQIASNKRKTVVLL 

:|::|||:Mhl II 
MLFEQIAANKRRTWFLL 

10 



381 411 441 471 501 531 561 591 

IVFFCLIAAIGAAVGYLVLGSYQFGLVLALIIGVIYAVSMIFQSTNVVMSMNNAREvTEDEAPN^ 

= II III IIM III : I l = = :| = llhll|::|lllll llllll Ihhl III = I I h I I I h I I I I I 
VAFFALIALIGAAAGYLWMNSPLGGVIIAFIIGLIYAITMIFQSTEVVMSMNGARQVSEQEAPELYHIVQDMAMVAQIPM 

15 30 40 50 60 70 80 90 

621 651 681 711 741 771 801 831 

PRVFIVEDDSLNAFATGSKPENAAVAATTGLIAVMNREELEGVIGHEVSHIRNYDIRISTIAVAIASAVTLISSIGSRML 

iihiinn iiiiiii iiiiiiiiiiiii nninninniiiiiiniiinniiiiihhiih in 

20 PRVYIVEDDSPNAFATGSNPENAAVARTTGLLRLMNREELEGVIGHEVSHIRNYDIRISTIAVALASAITMISSVAGRMM 
110 120 130 140 150 160 170 

861 888 918 948 978 1008 1038 1068 

FYGGGRRRDDDREDGG-NILVLIFSILSLIIiAPLAASLVQIAlSRQREYLADASSVELTRNPQGMISALEKLDRSEPMGH 



WYGGGRRROTRDDDSGLGLLMLVFSLIAIIIAPLAATLVQIAISRQREFIADASSVELTRNPQGMIRALQKLDNSEPMHR 
190 200 210 220 230 240 250 



1098 1128 1158 1188 1218 1248 1278 1308 

30 PVDDASAALYINDPTKKEGLKSLFYTHPPIADRIERLRHM*SLTKRRVAMPCVLFF*DKACKT*YNMTYTIKGDGTCYLQ 

Illllllllhll II Ih lllllllh:hllll I 
HVDDASAALYISDPKKKGGLQKLFYTHPPISERVERLRKM 
270 280 290 

SEQ ID 8848 (GBS179) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
35 extract is shown in Figure 175 (lane 11; MW 58kDa). 

GBS179-GST was purified as shown in Figure 227, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1571 

40 A DNA sequence (GBSxl665) was identified in S.agalactiae <SEQ ID 4855> which encodes the amino 
acid sequence <SEQ ID 4856>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.44 Transmembrane 4 - 20 ( 1 - 27) 
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Final Results 

bacterial membrane Certainty=0. 7177 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG23700 GB:AF017421 LemA-like protein [Streptococcus gordonii] 
Identities = 124/182 (68%) , Positives = 152/182 (83%) 

55 Query: 1 MGTMILIAIIALWIVaiVAYNSLWSRMHTKESWSQIDVQLKRRNDLIPNLIETVKGYA 60 

M +1 IA+I + V+++I YNSLVR+RM T+E+WSQIDVQLKRRNDL+PNLIETVKGY 
Sbjct: 1 MSFIITIAVIWIVLFVISVYNSLWARMQTQEAWSQIDVQLKRRNDLLPNLIETVKGYG 60 



Query: 61 AYEGKTLEKIAELRAQVAKANTPAEAMTASNELTRQLSSILAVAENYPDLKANNSFVKLQ 120 



